6 \S-73 



UNIVERSITY OF ILLINOIS BULLETIN 

Issued Weekly 
Vol. XIX January 30, 1922 No. 22 

[Entered as second-class matter December li, 1912, at the post office at Urbana, Illinois, under the 
Act of August 24, 1912. Acceptance for mailing at the special rate of postage provided for in 
section 1 103, Act of October 3, 1917, authorized July 31, 1918.] 



BULLETIN NO. 8 

BUREAU OF EDUCATIONAL RESEARCH 
COLLEGE OF EDUCATION 



A CRITICAL STUDY OF CERTAIN 
SILENT READING TESTS 



By 
Walter S. Monroe, Director 




Price 50 Cents 



PUBLISHED BY THE UNIVERSITY OF ILLINOIS 
URBANA 



^nog-faph 



BULLETIN NO. 8 

BUREAU OF EDUCATIONAL RESEARCH 

COLLEGE OF EDUCATION 



A CRITICAL STUDY OF CERTAIN 
SILENT READING TESTS 



t 



By 



Walter S. Monroe, Director 




PUBLISHED BY THE UNIVERSITY OF ILLINOIS 
URBANA 






LIBKARY OF CONGWFSS 
DOCUMENT 



TABLE OF CONTENTS 



PAGE 

Preface 4 

The Measurement of Silent Reading Ability 5 

The Problem 5 

The Data Collected 6 

The Performances Required of a Pupil 9 

Description of Pupils' Performances 13 

Scoring Reproductions 15 

The Idea-Counting Method 16 

Brown's Method of Idea Counting 17 

The Word-Counting Method 19 

Subjectivity of Describing Reproductions 20 

Constant Errors and Variable Errors 20 

Summary for Describing Reproductions 25 

Scoring Answers to Questions 25 

Describing the Quality of Compositions 25 

Time Required for Scoring Test Papers 25 

Average Scores and Standard Deviations 26 

Equivalence of Duplicate Forms 29 

Relation of Vocabulary to Difficulty 30 

Formation of Composite Scores 31 

Reliability 32 

Methods of Determining Reliability 32 

Probable Error of r Due to Sampling 33 

Reliability of Tests Studied 34 

Discrimination 37 

Comparison with Teachers' Ratings 39 

Correlation of Comprehension with Memory 39 

Corrected Coefficients of Correlation 41 

Correlation of Comprehension with Vocabulary 42 

Correlation of Cancellation Scores with Measures of Rate 

of Reading 44 

Correlation of Comprehension with Written Composition 44 

Inter-correlation between Tests 46 

Correlation of Single Tests with Composites 50 

Summary of Conclusions 51 

Correlation with Composites 52 



PREFACE 

In the field of silent reading, as well as in the fields of other 
school subjects, the number of available educational tests has been 
increased so that one desiring to use a test is confronted with the 
necessity of making a choice. If such a choice is to be made intelli- 
gently it Is necessary to have at hand experimental data with refer- 
ence to the reliability and validity of the tests considered. The study 
which is reported in this monograph was undertaken for the purpose 
of securing such data with reference to certain silent reading tests. 
The report is presented in hopes that users of silent reading tests 
will find the information that it contains helpful in making an intel- 
ligent selection of educational tests in this field. The monograph 
will doubtless also be of interest to students in the field of educa- 
tional measurements. 

Walter S. Monroe, 

Director, Bureau of Educational Research. 



A CRITICAL STUDY OF CERTAIN SILENT 
READING TESTS 

The measurement of silent reading ability. The scores yielded 
by silent reading tests may fail to be true measures of silent reading 
ability for two reasons. First, the scores may not be reliable or ac- 
curate. A score is lacking in reliability when two applications of a 
test or of duplicate forms of it do not yield approximately the same 
score when administered to the same pupils, as far as possible, 
under the same conditions. Included in this is any lack of objectivity 
in the scoring of the test. Second, the performance which a pupil 
gives on a silent reading test may depend upon other factors in such 
a way that it is an index of these factors rather than of silent read- 
ing ability. For example, when a pupil answers questions from 
memory his answers may be influenced to such an extent by his 
ability to remember that his performance is not a truthful index of 
his ability to read silently. 

Two aspects of the activity of silent reading may be recognized. 
First, the reading mechanism consists of perception, eye-movement 
habits, etc. The rate of silent reading is largely dependent upon this 
mechanism and hence any measure of rate is an index or symptom 
of the quality of the mechanism. Second, the thought-getting or 
comprehension aspect of silent reading involves the higher mental 
processes. The quality of this is indicated by the comprehension 
scores. Comprehension is not entirely independent of the mechan- 
ism of silent reading, but, if sufficient time is allowed, pupils who 
possess poor reading mechanism may stand high in thought-getting. 
The problem. The problem of this study is to ascertain the 
reliability and, so far as possible, the function and validity of certain 
silent reading tests. These tests, as will be shown later, differ in 
the performances which are required of the pupils. They also differ 
in other respects. Their titles suggest that all of the silent reading 
tests included in this study are designed to measure silent reading 
ability. The fact that they differ widely in certain respects suggests 
the possibility that no two of them measure the same type of read- 
ing ability, or at least that they do this with different degrees of 
validity. The study has been restricted to tests which yield some 
measure of the rate of reading as well as a measure of comprehen- 
sion in order that the measurement of both phases of silent reading 



activity might be studied. With one exception, the tests which have 
been used have dupHcate forms. In addition to the silent reading 
tests, certain jother tests were given to the same pupils, because it was 
thought that the scores yielded by them might assist in the analysis 
and interpretation of the scores yielded by the silent reading tests. 

The data collected. Through the courtesy of Superintendent W. 
W. Earnest and certain teachers of the Champaign Public Schools, 
the tests chosen for this study were given in the spring of 1920 to 
a number of pupils in the fourth and seventh grades. All of the 
tests were administered by Miss Dora Keen, at that time a research 
assistant in the Bureau of Educational Research. Care was exer- 
cised to secure as nearly uniform testing conditions as can be ob- 
tained in the ordinary schoolroom. The lapse of time between the 
giving of the different forms of the same test was made as nearly 
equal as possible for the different groups. Only in rare instances 
were tests given after recess in the afternoon or during the afternoon 
session on Friday. The tests were given to all pupils in four rooms 
in both the fourth and seventh grades. The total number of pupils 
tested in each grade was approximately 140. The study is, however, 
based upon the records of only those pupils who took all of the tests. 
The number of complete records in the fourth grade is 80 and in the 
seventh grade, 91. 

The following tests were given in the fourth grade: 

1. The Courtis Silent Reading Test No. 2^, Form i, "The 
Kitten Who Played May Queen," and Form 3, "The Kitten Who 
Caught a Fish." 

2. Brown's Silent Reading Test^, Form i, "The Long Slide," 
and Form 2, "A Morning Adventure." 

3. Monroe's Standardized Silent Reading Test P, Forms 
I, 2, and 3. 



'Courtis Silent Reading Test No. 2. Forty-sixth Annual Report. Kansas City, 
Missouri: Board of Education, 1917. pp. 79-85. 

^Brown, H. A. "The Measurement of Ability to Read." A Manual of Direc- 
tions Concerning Giving and Scoring of Reading Tests, Statistical Treatment of 
the Data and Diagnosis of School Class and Individual Needs. Concord: New 
Hampshire Department of Public Instruction (in cooperation with the General 
Education Board). Bureau of Research Bulletin No. i, Second Edition, 1916. 
PP- 57- 

"Monroe, W. S. ''Monroe's Standardized Silent Reading Tests." Journal of 
Educational Psychology, 9:303-12, June, 1918. 



4- Fordyce's Scale for Measuring Achievement* in read- 
ing Test No. I, "Narcissus." 

5. Experimental Reproduction Test I, Form i, based on pages 
84 and 85 of the supplementary reader, "The Strike at Shane's"^, 
and Form 2 based on pages 6 and 7 of the same publication. The 
passage for Form i contains 370 words and that for Form 2, 395 
words. In administering these tests the pupils read from the sup- 
plementary reader. The exact place of beginning had been marked 
in each copy. Also the end of the passage to be read was indicated. 

6. Cross-Out Silent Reading Test I, Form i and Form 2. This 
is an experimental silent reading test. In a passage of rather simple 
reading material, words were substituted, which did not agree with 
the meaning of the preceding words in the sentence. A pupil is asked 
to cross out the words which do not fit. With the exception of the 
substituted words, the selection is a connected story. 

7. Vocabulary Test. The words of this test are those used by 
Terman and Childs. The form of the test is that proposed by 
Whipple". 

8. Cancellation Test, "a-t" and "e-r^ 

9. Memory, "How Mr. Lincoln Helped the Pig."^ 
The following tests were given in the seventh grade: 

1. Starch's Silent Reading Test No. 6 and Test No. 7.^ 

2. Monroe's Standardized Silent Reading Test II, Forms i, 
2, and 3. 

3. Fordyce's Scale for Measuring Achievement in Reading, 
Test No. 2, "Spirit of Spring." 



*Fordyce, Charles. "A Scale for Measuring the Achievements in Reading." 
The University Publishing Company, Lincoln, Nebraska, and Chicago. 1916. 

'^■'The Strike at Shane's." (Gold Mine Series, No. 2.) Boston: American 
Humane Education Society, 1908. pp. 91. 

(A supplementary reader for the fourth grade which has as its lesson kindness 
to domestic animals.) 

"Whipple, G. M. Manual of Mental and Physical Tests, Complex Processes 
Chapter 12. Baltimore: Warwick and York, 19 14. 

'This test is described by Whipple in the Manual of Mental and Physical Tests, 
Simpler Processes, p. 311. 

^Whipple, G. M. Manual of Mental and Physical Tests, Simpler Processes, 
Pages 207-10. 

'Starch, Daniel. The Measurement of Efficiency in Reading. Journal of Ed- 
ucational Psychology, 4:1-24, 1915. These tests were used as duplicate forms. 



4- Experimental Reproduction Test II, Form i, based on 
pages 6 and 7 of the supplementary reader, "Old English Heroes,"^" 
and Form 2, based upon pages 8 and 9 of the same publication. 
The passage for Form i contains 662 words, and that for Form 2, 
611 words. 

5. Cross-Out Silent Reading Test II, Form i and Form 2. 
This test Is similar to the Cross-Out Silent Reading Test used in the 
fourth grade but is based upon more difficult material. 

6. Pressey Silent Reading Test for Grades VI, VII, and VIII, 
Form I and Form 2. This is an experimental test. 

7. Vocabulary Test. This is the same test as that used in the 
fourth grade. 

8. Cancellation Test, "a-t" and "e-r." This is also the same 
as that used in the fourth grade. 

9. Memory Test, "Marble Statue."" 

10. Composition Test. The Willing Composition Scale^^ and 
the directions which accompany it were used. 

In addition to the above tests a rating for ability in silent read- 
ing was secured from the teachers. To guide them in making this 
rating, the teachers were given the following directions: 

Think of all the fourth (seventh) grade pupils with whose silent reading 
ability you have ever become acquainted from the best to the poorest. Compare 
each child in your present class with this distribution of pupils. Give a pupil a 
rating of S if he has very superior ability in silent reading equalled only by about 
seven out of every hundred, or 7 percent of fourth (seventh) grade pupils. Give 
him a rating of 4 if he has superior ability or ability above the average, yet is ex- 
celled by the very superior group. About 24 out of every hundred, or 24 percent 
of fourth (seventh) grade pupils, will fall in the superior group. Give him a rating 
of 3 if he possesses average ability, i. e., ability which lies somewhere close to the 
middle of the difference between the very best pupil and the very poorest. About 
38 out of every hundred, or 38 percent of fourth (seventh) grade pupils, will fall 
in this average group. If the pupil is below the average in ability to read and yet 



^"Bush, Bertha E. Old English Heroes. (Instructor Literature Series — No. 
116.) Danville, N. Y., and Chicago: F. A. Owen Publishing Co., and Hall and 
McCreary, 1909. Pp. 31. 

This is a supplementary reader suitable for the upper elementary grades. It 
contains brief sketches of the lives of Alfred the Great, Richard the Lion-Hearted, 
and the Black Prince. 

"Wliipple, G. M. Manual of Mental and Physical Tests, Simpler Pro- 
cesses, Pages 107-10. 

"Willing, M. H. Measurement of Written Composition in Grades IV to VIII, 
English Journal, 7:193-202, March, 1918. 



does not equal the poorest you have ever known give him a rating of 2. This group 
is called inferior and will contain about 24 out of every hundred, or 24 percent of 
fourth (seventh) grade pupils. Give the pupil a rating of i if he is very inferior 
in ability to read so that he is as poor or very nearly as poor as the poorest pupil 
you have ever known. About 7 out of every hundred, or 7 percent of fourth 
(seventh) grade pupils, will fall in this very inferior group. 

The above directions do not mean that you will necessarily be obliged to give 
7 percent of your class a rating of 5 ; 24 percent, a rating of 4; 38 percent, a rating 
of 3; 24 percent, a rating of 2; and 7 percent, a rating of i. They do mean, however, 
that a large number of pupils, a number running up into the hundreds, can be 
divided in exactly this manner, i. e., 7 percent, very superior; 24 percent, superior; 
38 percent, average; 24 percent, inferior; and 7 percent, very inferior. You are to 
think of all the pupils you have ever known from the best to the poorest and by 
comparison give each pupil in your present class the rating he would receive if 
he were included with all the pupils you have known and the entire number should 
be rated in the above manner. 

The performances required of a pupil. All of the silent reading 
tests in the above list are designed to measure the ability to read 

silently. However, they require a variety of performances from the 
pupil. In the Courtis Silent Reading Test No. 2, the pupil is re- 
quired to read a continuous selection for three minutes. At the end 
of this time he turns to another section of the test and answers ques- 
tions based upon the selection he has just read. The questions are 
to be answered by either "yes" or "no." The selection read is re- 
peated in connection with the questions so that the pupil may refer 
to it in case he does not remember the answer to any question. The 
Brown Silent Reading Test and the Starch Silent Reading Tests 
require the pupil to read a selection and then reproduce what he can 
remember. Starch allows thirty seconds reading time, while Brown 
allows one minute. The Monroe Standardized Silent Reading Tests 
consist of a series of exercises. Each exercise consists of one para- 
graph and a question based on it. Most of the answers are to be 
given by drawing a line under a word. Five minutes are allowed 
for the test. The Fordyce Scale for Measuring Achievement in 
Silent Reading^^ requires the pupil to read a selection and then an- 
swer from memory questions based on it. The selection for Test i 
contains 300 words. The time allowance is 125 seconds. The selec- 
tion for Test 2 contains 512 words with a time allowance of 140 
seconds. The time allowed for the reading is intended to be such 
that 50 percent of the pupils will finish before time is called. 
The directions which accompany the Fordyce Scale for Meas- 



"This test has only one form. Test i was given in the fourth grade and Test 
2 in the seventh grade. 



uring Achievement in Silent Reading are stated in general terms. 
For this reason it was necessary to formulate the exact explana- 
tion to be given to the pupils. The following was used: 

Do not turn over your paper until I tell you to begin. These papers have 
a story on them. You are to read the story at your ordinary rate of reading, care- 
fully enough so that you will be able to reproduce the leading thoughts. When 
I say "mark," draw a line around the word at which you are looking at that time. 
If you have not finished go right on reading until you come to the end of the 
story. Then immediately turn your paper face down and sit quietly until all have 
finished. You are to read the story once and once only, and just as soon as you 
have finished, turn your paper down. Is there any one who does not understand 
exactly what to do.^ All right! Begin! 

In the Experimental Reproduction Test the following directions 
were used: 

Do not open your books until I tell you to begin. Write your name and school 
on the card." 

This is a test to find out how rapidly and how well you can read. 
Read carefully; for you will be asked to write out what you have read. Put your 
finger in the book this way (illustrating). When I say "begin" open your books 
and begin to read at the first blue mark here (illustrating). When I say, "mark," 
draw a line around the word at which you are looking, (illustrate), then go right 
on reading until you come to the last blue mark. Then close your book and sit 
quietly until all have finished. Read over only once. Do not forget to draw a 
line around the word where you are reading when I say, "mark." Is there anyone 
who does not understand just what he is to do? All right! Begin! 

The time allowance was thirty seconds. After they had com- 
pleted the reading, the pupils were asked to write, in as nearly the 
same words as possible, all that they had read. This reproduction 
completed, they were asked to answer a list of questions based upon 
the selection read. They were not given an opportunity to consult 
the reproduction nor to add to it after answering the questions. 

The nature of the Cross-Out Silent Reading Test is illustrated 
by the directions given to the seventh grade pupils: 

Below you will find a paragraph of a story. Certain words in this paragraph do 
not belong there, that is, they do not make sense and do not agree with what has 
gone before. Read this paragraph carefully and draw a line through all the words 
which do not belong there. Do not write anything. Do nothing except cross out 
the words which do not make sense with what has gone before. Is there anyone 
who does not understand what he is to do? Remember to cross out only the words 
which do not agree with what has gone before. All right! Go ahead! 



"A 3x5 card was fastened to the copy of the supplementary reader which was 
given to each pupil. Before the books were distributed to another class the rate 
scores were recorded on the cards and a new card attached. 

10 



"It happened in our country long ago, in those old days when only a fev 
white people lived here and everything was rough and civilized. Strong men were at 
work among the hills, cutting down the brooks and planting corn in the new 
fields, and towns were springing up all along the walls, but still there were many 
miles of forest where Indians hunted and bears and wolves had their palaces." 

In this paragraph the words to be crossed out are "dvilized", 
"brooks", "walls", and "palaces". These answers were read to the 
pupils after they had marked the paragraph. In case any failed to 
understand the nature of the exercise it was explained to them. They 

were then directed as follows: 

In the following pages you will find part of a story. It is not a fairy story. 
In this stery, as in the paragraph above, there are words which do not agree with 
the meaning of what has gone before. Cross them out just as you did in the above 
paragraph. Be sure to cross out all the words which do not belong, but cross out 
only those words; for if you cross out any word which should not be crossed out 
it will be counted as a mistake. You will be allowed four minutes to work. Many 
of you will be unable to finish during this time. It is more important, however, 
to do your work correctly than to cover a great deal of ground. Do all three pages. 

When I say "begin" turn the page and start to work. If anyone finishes before 
the time is up, close your paper and sit quietly. Is there anyone who does not 
understand just what he is to do? All right! Begin! 

The directions to the fourth grade pupils differed from the above 
in only two respects. Two additional illustrative paragraphs were 
used and the time allowance was three minutes instead of four. 

The nature of the Pressey Silent Reading Test for Grades six 
to eight may be illustrated by the directions: 
Look at the first example given just below: 

1. February is the longest month in the year. The above statement is not 
true; but there is only one word that makes the sentence untrue. This one word 
is the word "longest"; if "longest" were changed to "shortest", the sentence would 
then read, "February is the shortest month in the year", which is true. "Longest" 
is wrong; so take your pencils and cross It out. Draw a line through it because 
it is wrong. 

Look at the second example just below: 

2. The day dawned bright and dreary; the clear morning light streamed in 
through the windows and filled the room with its cheery brightness. 

In this paragraph, also, there is one, and only one, word that is wrong, the 
meaning of which does not fit in with the meaning of the rest of the paragraph. 
The word is "dreary". Cross it out. 

Two additional illustrative exercises were given and the pupil 
directed as follows: 

And now — everyone attention! In each of the paragraphs on the other side 
of the page, there is one, and only one, word that is wrong, which makes the para- 
graph untrue, or whose meaning does not fit in with the meaning of the rest of 

11 



the paragraph. Cross that Avord out. And remember, there is only one word in 
each paragraph that is wrong. Be sure to take the paragraphs in order. Never 
skip a paragraph without attempting it. Read rapidly and accurately. You will 
be given lo minutes in which to work. Ask no questions. 
Now, turn over the page, and all start! 

In the vocabulary tests the following directions, which are 

printed on the test papers, were read to the pupils: 

Below are lOO words which are designed to measure the size of your vocabulary. 

Consider each one carefully, and place before it one of these four marks: 

(i) the mark "D" if you could define it as exactly as words are ordinarily 

defined in the dictionary. 

(2) the mark ''E" if you could explain it well enough to give some idea of 
its meaning to one who is not familiar with it, though you could not give an exact 
definition that would satisfy an expert. 

(3) the mark "F" if the word is merely roughly familiar, so that you have 
only an indefinite idea of its meaning and could not use it intelligently. 

(4) the mark "N" if the word is entirely new and unknown to you. 

When you have finished, count the marks and fill out these blanks, making 
sure that the numbers add to one hundred. 

In the fourth grade these directions were modified somewhat 
in order to make certain that the pupils would understand them. 
Fifteen minutes were allowed for the test in both grades. 

The Cancellation Tests consist of a page of Spanish text. For 
the "a — t" test the following directions were given to the pupils: 

On this paper you will find a large number of words from a foreign language. 
Draw a line through each of these words which contain both an "a" and a "t." 

If the word has an "a" but not a "t" in it do not cross out the word. If it 
has a "t" but not an "a" do not cross it out. Be sure to draw a line through all 
words which contain both an "a" and a "t," but only through these words; for if 
you cross out a word which does not have both an "a" and a "t" in it, it will 
count as a mistake. When I say "begin" turn over your paper and begin work. 
You will be allowed two minutes to work. Your score will depend on the number 
of words you cross out correctly. 

In addition to this explanation of the test, four non-consecutive 
words were selected from the text and written on the blackboard 
in order to illustrate the kind of words to be crossed out. The ex- 
planation for the "e — r" test is identical with the above except that 
"e" and "r" are used in the place of "a" and "t." 

In the Memory Tests the pupil was directed as follows: 
This is to be a test to see how well you remember what you hear. I am going 
to read a little story, and I want every one to pay close attention; for as soon as 
I have finished I want you to write down, in as nearly the same words as possible, 
what I have just read to you. Listen carefully, and as soon as I stop reading write 
down all that I have just read. Your score will depend on how nearly you re- 
member what has been read to you. Do not begin to write until I have finished 

12 



reading. Is there anyone who does not understand just exactly what he is to do? 
All right! Attention! 

In the composition test the following topics were written on the 
blackboard. Then the directions given below were read to the pupils: 

AN EXCITING EXPERIENCE. 

A storm. An unexpected meeting. 

An accident. In the woods. 

An errand at night. In the mountains. 

A wonderful story. On the ice. 

A runaway. On the water. 

I want you to write me a story. It is to be a story about some exciting ex- 
periences that you have had, or about something very interesting that has happened 
to you. If nothing of the sort has ever happened to you, then tell me of an ex- 
citing experience someone whom you know has had. You may even make up a 
story of this kind, if you have to, though I believe you will do better, on the whole, 
with a real one. I am going to give you about twenty minutes in which to write. 
You are to write on both sides of the paper, to do all the work yourselves, and to 
ask no questions at all after you begin. You may make whatever corrections you 
wish between the lines. There will be no time to rewrite your story. 

I have written the general subject on the blackboard, together with some sug- 
gestions. You do not have to write on any of these topics unless you want to; 
they are merely to help out in case you cannot think of an exciting experience 
yourself. Is there anyone who does not understand just what he is to do? All 
right! Begin! 

Twenty minutes were allowed for the actual writing. Then the 

pupils were directed as follows: 

You are to have four or five minutes in which to finish your stories, make 
corrections, and count the number of words written. Write this number at the 
end of your story. 

Description of pupils' performances. In order to eliminate or 
reduce accidental errors and subjective errors to a minimum, all test 
papers were scored independently by two persons working under 
careful supervision. In the case of those scores for which the sub- 
jective factor was negligible, any differences between the two scores 
were reconciled by a third person.^ ^ When a subjective error was 
involved the average of the two scores was taken unless the differ- 
ence between them exceeded a fixed maximum. In this case the 
paper was scored by a third person in an attempt to reconcile the 
two scores. 

The description of a pupil's rate of reading is objective. Hence 
only accidental errors are involved. The rate was expressed in 
terms of words per minute. The scoring of comprehension in the 



^This third person was the same for all tests, and also was the one who super- 
vised the scoring. 

13 



following tests was also highly objective: Monroe's Standardized 
Silent Reading Tests, Courtis' Silent Reading Test No. 2, Cross-Out 
Silent Reading Tests, Pressey's Silent Reading Test, and Cancella- 
tion Test. 

Monroe's Standardized Silent Reading Tests were scored for 
comprehension according to the usual directions with a few slight 
changes with respect to the answers which were considered correct. 
The pupil's comprehension score is the sum of the comprehension 
values of the exercises which he does correctly. 

The directions which accompany the Courtis Silent Reading 
Tests No. 2, provide for two measures of comprehension, the index 
of comprehension and the number of questions answered. The index 
of comprehension is found by subtracting the number of wrong an- 
swers from the number of right answers and dividing the difference 
by the number of right answers. In addition to these two scores 
the number of right answers was recorded. 

Two methods of scoring the Cross-Out Silent Reading Tests 
for comprehension were used. It was found that pupils made two 
types of errors. Some crossed out words which should not have 
been crossed out, and words which should have been crossed out 
were not marked. One description was obtained by taking the dif- 
ference between the number of words correctly marked and the 
number of words wrongly marked. (This included only the first 
type of error.) This score is indicated by the symbols c — w. In 
the second score, the number of inconsistent words, which the pupil 
failed to mark in the part of the test read, was recognized. 

c — — w 
The score was obtained by evaluating the following fraction, ^ — -j- — 

In this fraction c and w have the same meaning as above and o 
stands for the number of words omitted.^® 

In the Pressey Silent Reading Test a pupil's comprehension 
score is the number of exercises which he does correctly within the 
time allowed. In order to have an exercise counted as right the 
correct word must be crossed out and no other word in the para- 
graph marked. 

The Vocabulary Test was scored according to standard direc- 
tions." Each "D" and "E" was regarded as indicating one point 
and each "F" as indicating a half-point. (See page 12.) The total 



"Whipple, G. M. Manual of Mental and Physical Tests. Simpler Pro- 
cesses, p. 313. 

"Whipple, G. M. Manual of Mental and Physical Tests, Part II, Complex 
Processes, p. 310-11. 

14 



number of points represents a vocabulary-index. This index, taken 
as a percent and multiplied by 18,000, affords a measure of the size 
of the pupil's total vocabulary. 

In the cancellation tests the score was obtained by convertincr 
rate and accuracy into a single index of efficiency (E).^^ -pj^jg -^^^^^ 
was obtained by the following formulae: 

A= ^ E=e A 

c-j-o 

Here A == the index of accuracy. 

E = the index of net efficiency. 
e = the number of words examined. 
o == the number of words erroneously omitted, 
c = the number of words crossed, 
w = the number of words wrongly crossed. 
After computing the index of accuracy the score in terms of the in- 
dex of efficiency was obtained. 

The scoring of answers to questions obtained from Fordyce's 
Scale for Measurement of Achievement in Silent Reading and from 
the Experimental Reproduction Tests is less objective than the scor- 
ing of the tests just described. Fordyce gives a list of correct an- 
swers. This, together with the nature of the questions, makes the 
scoring of his test highly objective for its type. In the course of scoring 
the answers to the questions of the Experimental Reproduction Tests, 
lists of correct answers were compiled and all scoring was done in 
accordance with them. The acceptable answers were chosen with 
care from the complete array of all answers given in each of the 
tests. Any word or group of words judged to give correctly the total 
idea called for by the question was counted as correct. 

Scoring Reproductions. The reproductions obtained from 
Brown's Silent Reading Test, Starch's Silent Reading Tests, the 
Experimental Reproduction Tests, and the Memory Tests were scored 
by both the "idea-counting method" and the "word-counting 
method." In addition, Brown's tests were scored according to the 
directions which he gives. The description of a reproduction is not 
highly objective. Pupils differ widely with respect to vocabulary 
and to sentence structure. In addition to incorrect statements, re- 
productions contain superfluous statements and repetitions. The 
order of ideas is frequently transposed so that their significance is 
modified. Ideas contained in the passage read are expressed with 



"Whipple, G. M. Manual of Mental and Physical Tests, Part I. Simple Pro- 
cesses, pp. 312-13. 

15 



various degrees of completeness. These characteristics of reproduc- 
tions create many opportunities for differences of opinion In their 
description. 

1. The idea-counting method. The first step in using this 
method is to divide the selection read Into ideas. In making this 
division one may adopt a relatively small unit, which is essentially 
a word or phrase, or a large unit, which approximates a sentence. 
After experimenting with these two plans of division the former was 
chosen. A portion of Brown's Silent Reading Test, "The Long 
Slide," with the divisions indicated, is reproduced below: 

THE LONG SLIDE 

The boys / and girls / who live / in a certain part / of a small / town/ in the 
country / several miles / from any village / attend / school / in a little / red / school- 
house / known as / the Long Hill / school. / 

It has / this name / because / it is situated / on the top / of a very long / steep/ 
hill./ Ever since anyone / can remember, / the scholars / of the Long Hill / school / 
have always had / time / to slide / down the hill / just once / at recess / in winter / 
and get back / to the school house / before the bell / rings / to call them back again / 
into school. / They can go down / very rapidly, / but it takes / a long time / to walk 
back./ 

Last Monday / morning / Frank Lane / appeared / at school / with a fine / new/ 
sled. / It was a double-runner / which his uncle, / who owns / a carriage factory / in 
the city, /had given him. / He named / his new / sled / the Simoon / and almost had/ 
a fight /with Tom Smith, / who said / it was foolish / to put / such a name / on a 
sled, / but he kept on / calling it / the Simoon. / 

At recess / that day / Frank / invited / the whole / school / to go / for a coast/ 
and the twelve / boys / and girls / got onto / the sled / and away they went / down 
the steep hill. / When recess was over / Miss Black, / the teacher, / rang the bell / 
but not a scholar / appeared./ Thinking that / the children / had stopped / to play / 
on the way back / from their slide, / Miss Black / went / to the door / and looked / 
down the hill / and rang / the bell / again./ But not a scholar / was in sight./ Then 
she was greatly astonished / and began / to be very angry, / for nothing / like this / 
had ever happened / in all of her twenty-eight / years / as a teacher. / She waited / 
and waited / but still / no scholars / appeared. / She stopped / every team / that 
came / up the hill, / but no one / had seen / anything of them. / 

She stayed / at the schoolhouse / and wondered / what had become of / her 
children / until it was time / to let out / school / and then / she went / over to John 
Reed's / who lives / nearest to the school house / and whose son / and daughter / 
were among the missing / scholars. / Mr. Reed / was greatly frightened / at what 
Miss Black / told him / about the disappearance / of her school / and immediately/ 
hitched up / his horse / to go in search / of the lost / children. / Just / as he was 
driving / out of the dooryard / the scholars / appeared / far down the hill. / It was 
almost / dark / before / they got back / to the schoolhouse. / 

The pupil's score is the number of ideas which he reproduces 

correctly. Thus, the scorer must determine what ideas, occurring in 

the passage read, appear in the pupil's reproduction. Two rules 

were adopted. 

1. Misplaced clauses and phrases, that is, clauses and phrases 
which are tacked on to the wrong part of a sentence, are to be 
counted as incorrect. 

2. Correct ideas found in a statement, which, as a whole, Is 
directly contrary to the meaning of the text read, are to be counted 

16 



as correct. The following example may be cited: John Shafts was 
not cruel. Here, both the ideas, John Shane and cruel, are held to 
be correct, while was not is incorrect. In practically aJl cases com- 
ing under this rule the incorrectness of the statement was caused 
by the use of a wrong verb or a wrong adverbial modifier, as in this 
illustration. 

The scorers were urged to keep in mind the general rule that 
they were to match up identical ideas in the passage read and in 
the pupil's reproduction, even though sometimes the ideas were not 
expressed in the same language. In order to secure independent 
scorings, each selection, with the divisions into ideas indicated as 
shown above, was mimeographed. The scorer indicated on this 
mimeographed copy the ideas which in his judgment the pupil had 
reproduced. In this way no record of the scoring was made on the 
pupil's test paper, and complete independence of scoring was secured. 

In putting together the results from two independent scorings, 
when the difference in the number of ideas was six or less, the av- 
erage was taken. In the case of a difference of more than six the 
third person went over both papers to change too lenient or too 
severe scoring. These changes were made until the difference was 
reduced to six or less. Then the average was taken. 

Brown's method of idea-counting. Brown has given directions 
for describing the reproductions written by pupils in terms of 
"quantity of reproduction" and "quality of reproduction." As a 
basis for his method of scoring, the selection is divided into sections 
each of which he considers to represent a unit of thought. A por- 
tion of "The Long Slide" is reproduced to show his plan of division: 

THE LONG SLIDE 

The boys and girls who Hve in a certain part of a small town in the country 
several miles away from any village attend school(i) in a little red schoolhouse 
known as the Long Hill School. (2) 

It has this name because it is situated on the top of a very long, steep hill. (3) 
Ever since anyone can remember, the scholars of the Long Hill school have always 
had time to slide down the hill just once at recess in winter and get back to the 
schoolhouse before the bell rings to call them back again into school. They can 
go down very rapidly, but it takes a long time to walk back. (4) 

Last Monday morning Frank Lane appeared at school with a fine, new sled. {5) 
It was a double-runner which his uncle, who owns a carriage factory in the city, had 
given him. (6) He named his new sled the Simoon(7) and almost had a fight 
with Tom Smith, (8) who said it was foolish to put such a name on a sled, but 
he kept on calling it the Simoon. (9) 

17 



At recess that day Frank invited the whole school to go for a coast, and the 
twelve boys and girls got on to the sled and away they went down the steep hill.(io) 
When recess was over, Miss Black, the teacher, rang the bell but not a scholar 
appeared. Thinking that the children had stopped to play on the way back from 
their slide, Miss Black went to the door and looked down the hill and rang the 
bell again. But not a scholar was in sight.(ii) Then she was greatly astonished 
and began to be very angry, (12) for nothing like this had ever happened in all 
of her twenty-eight years as a teacher. (13) She waited and waited, but still no 
scholars appeared. (14) She stopped every team that came up the hill, but no one 
had seen anything of her school. (15) 

She stayed at the schoolhouse and wondered what had become of her children 
until it was time to let out school (16) and then she went over to John Reed's, who 
lives nearest to the schoolhouse (17) and whose son and daughter were among the 
missing scholars. (18) Mr. Reed was greatly frightened at what Miss Black told 
him about the disappearance of her school (19) and immediately hitched up his 
horse to go in search of the lost children. (20) Just as he was driving out of the 
dooryard, the school appeared far down the hill. (21) It was almost dark before 
they got back to the schoolhouse. (22) 

The idea which he considered expressed in each of these sec- 
tions has been condensed in a short statement. These form a key 
for scoring. The statements corresponding to the sections in the 
portion of the test reproduced above are given below: 

1. Some children in the country attend school. 

2. The schoolhotise is known as the I^ng Hill School. 

3. It is situated on top of a long hill. 

4. The pupils slide down hill once at recess in winter. 

5. One day a boy brought to school a nezu sled. 

6. His uncle had given it to him. 

7. He named it the Simoon. 

8. He almost had a fight with another boy. 

9. This boy said the name was foolish. 

10. At recess the pupils went for a slide. 

11. At the end of recess no pupils appeared. 

12. The teacher was astonished and angry. 

13. Nothing like this had ever happened before. 

14. After a long wait no scholars appeared. 

15. No one in passing teams had seen her school. 

16. She stayed at school until closing time. 

17. Then she went to the nearest neighbor. 

18. His children were among the scholars. 

19. He was gieatly frightened. 

20. He started to search for the children. 

21. Just then they appeared down the hill. 

22. They reached the schoolhouse just before dark. 

For using this key he gives the following directions :^^ 

"Brown's statement of these directions has been modified in order to make 
their meaning clear. 

18 



1. Each child's written reproduction should be carefully ex- 
amined, and the number of points in the key which are reproduced 
by him should be determined and expressed as a percent of the total 
number in that portion of the selection read. For example, in the 
part read by a certain child, there may have been forty-eight points, 
and he may have reproduced twelve of these. The amount repro- 
duced is, therefore, twenty-five percent of the amount read. This 
is called "quantity of reproduction". In arriving at a measure of 
quantity of comprehension, every idea reproduced by the child 
should be counted which, in most respects, is complete and which, 
in general, is correcdy stated, even though some of the less impor- 
tant details are lacking. Credit for quantity of comprehension is 
given only when all elements of the idea expressed by the words in 
italics in the key are either expressed or plainly implied in the child's 
reproduction. 

2. The reproductions should be examined a second time and 
only those ideas counted which are entirely correct in every respect 
and of which every detail is reproduced. This is called "quality of 
reproduction". 

2. The word-counting method. In applying this method, a 
pupil's reproduction is examined and the words which do not cor- 
rectly reproduce the selection read are crossed out. The pupil's 
score is the number of words remaining. The directions for cross- 
ing out words were essentially the same as those used by Starch in 
scoring his own silent reading tests. The scorers were directed to 
cross out the following classes of words: 

(a) Words which incompletely reproduce the thought. 

(b) Words which introduce new ideas. 

(c) Words which represent ideas reproduced elsewhere. 

(d) Superfluous connectives. 

The scorers were, also, directed to bear constantly in mind that 
the aim of this method is to ascertain the number of words which 
actually reproduce the thought contained in the passage read. In 
order to secure independence on the part of the scorers when using 
the word-counting method, the lines of the reproductions were num- 
bered. Sheets of ruled paper were then prepared with numbered 
lines. In scoring the reproductions, the words to be omitted in a 
line, when computing the pupil's score, were written on the corre- 
sponding line of the sheet of ruled paper. The number of words 
remaining in the line of the reproduction was then recorded in the 
right hand margin. The sum of these entries constituted a pupil's 

19 



score. No mark other than the numbers of the Hnes of the repro- 
ductions was made upon the pupil's test paper. Thus, the second 
scorer was not influenced in any way by the work of the first. The 
two independent scorings were reconciled by a third person, accord- 
ing to the rules given in the case of the idea-counting method, except 
that a difference of eight rather than of six was allowed before re- 
scoring was undertaken. This exception does not apply to the 
Memory Test. 

Subjectivity of describing reproductions. An examination of 
the records of scoring the reproductions shows many differences of 
opinion on the part of the scorers. One scorer gave credit for 
certain words or ideas which the other scorer rejected, while the 
second scorer gave credit for words and ideas rejected by the first 
scorer. These differences of opinion tend to balance each other in 
the resulting scores but not entirely. For some reproductions, two 
persons will give the same score. For others, the two scores will 
differ. In a few cases the difference will be marked. Whenever 
there is a difference, at least one score, and probably both, involve 
an error.-** Even when the two scores are identical both may in- 
volve an error. 

Constant errors and variable errors. The scoring of reproduc- 
tions even under favorable conditions, such as prevailed in this 
investigation, involves two types of errors — constant errors and vari- 
able errors. A constant error results in a scorer assigning scores 
which, in general, are too high or too low. A liberal attitude toward 
the reproductions will result in high scores. On the other hand, a 
conservative procedure will result in low scores. An indication of 
the presence of a constant error may be secured by comparing the 
averages of the two sets of scores assigned independently by two 
scorers to the same set of papers. Any differences in their general 
policy will be reflected by a difference between the averages of the two 
sets of scores. However, this difference cannot be considered to be 
an index of the magnitude of the constant error because both per- 
sons may be inclined to be liberal in their scoring, or both may be 
conservative, or one may be conservative and the other liberal. 

Variable errors are indicated by the fact that in scoring one 
reproduction Scorer A will assign a score of 90, and Scorer B a score 
of 75; but in scoring a second reproduction Scorer A may assign a 
score of 60, and Scorer B a score of 80. This may happen although 

^A score is said to involve an error when it differs from the true score which 
is defined as the average of a large number of scores assigned by different persons 

20 



Scorer B is, in general, more liberal than Scorer A. In studying the 
variable erorrs it is necessary to isolate them from the constant er- 
rors. Constant errors which affect the average of the scores as- 
signed by either person do not affect the coefficient of correlation. 
Hence, it may be used as an index of the magnitude of the variable 
errors. 

Tables I and II give data relative to both the constant and 
variable errors involved in the word-counting and in the idea-count- 
ing methods. Table I shows the facts for the first method and 
Table II for the second. The scorers are represented by letters. 
The numbers in the column headed "Difference of Average Scores" 
were obtained by subtracting the average of the scores assigned by 
the second scorer from the average of the scores assigned by the 
first scorer. A positive difference means that the first scorer gave, 
on the average, higher scores than the second. A negative differ- 
ence has the opposite meaning. In some cases the difference closely 
approximates zero, but in others it is relatively large. This indi- 
cates that, for some scorers, the constant error is relatively large. 
One is justified in asserting that, on the basis of the possible con- 
stant error in the scores assigned to reproductions by a single scorer, 
no reliable inferences can be made concerning the differences in 
reading ability of two groups of pupils unless the differences 
between their average scores are large. 

TABLE I, SUBJECTIVITY OF SCORING REPRODUCTIONS BY THE WORD- 
COUNTING METHOD 



Test 

Memory 

Memory 

Memory 

Memory 

Memory 

Memory 

Reproduction.. 
Reproduction.. 
Reproduction.. 
Reproduction.. 
Reproduction.. 

Brown 

Brown 

Starch (No. 7) 
Starch (No. 6). 



Form Grade 



Num 
ber of 
scores 



Scor- 
ers 



Difference 

of average 

scores 



P.E. Est.ii 



P.E.Est.u 



Average 



IV 
IV 
IV 
VII 
VII 
VII 

IV 

IV 

IV 

VII 

VII 

IV 
IV 

VII 
VII 



92 

27 

116 

123 

100 

31 

94 

31 

68 

117 

"3 

III 

no 

119 
121 



Y-C 
Y-K 
Y-C 
Y-K 
Y-C 
Y-K 

L-K 
L-C 
L-K 
M-F 
F-C 

T-Mj 
T-Mj 

M-C 

M-C 



—9.9 

—5-1 
— 2.0 

—7-5 
—8.2 

+4.1 

+6.8 
—1.6 

+4.7 
—0.5 
—6.0 

+ 12.8 
+6.9 

-5.8 
— 2.0 



4 -5 
3 4 
3-3 
5-5 
3-9 
2.6 

31 

2-4 
4-2 

9.2 

5-5 



2.6 

2.1 



.06 
.04 
■05 
•05 
.04 

•03 

.06 
.06 
.10 
.06 
•05 

•15 
.08 

.07 
•05 



21 



TABLE II. SUBJECTIVITY OF SCORING REPRODUCTIONS BY THE IDEA- 
COUNTING METHOD 



Test 


From Grade 


Num- 
ber of 
scores 


Scor- 
ers 


Difference 

of average 

scores 


Tit 


P.E. Est.it 


P.E.Est.ii 


Average 


Memory 

Memory 

Memory 

Memory 

Reproduction.. . . 
Reproduction... . 
Reproduction... . 
Reproduction.. . . 

Brown* 


'\ 


IV 

IV 

VII 

VII 

IV 
IV 
VII 
VII 

V 
V 

IV 
IV 
IV 
IV 

VII 
VII 


121 
116 

122 
128 

94 
100 
116 
112 

77 
75 

112 
116 

113 
118 

122 
124 


Y-P 
Y-P 
Y-P 
Y-P 

F-P 
F-P 
F-P 

S-F 

Cl-S 
Cl-S 

P-C 
P-C 
P-C 
P-C 

S-Cl 
S-Cl 


+0.1 
-t-0.6 
+ 1.0 
4-0.6 

—0.6 
+0.7 

—7-9 
+0.7 

+0.4 
+ 1-5 

+8.7 
+7-8 
-6.7 
+0.1 

—2.3 
—1.0 


■95 

.84 
.89 
•85 

•94 

■ 95 

■ 91 
.88 

.88 
.85 

.69 

•75 
.68 
.56 

.92 
•95 


I .1 

I .1 

1.6 

1 .0 

1.6 
14 
5-6 

4^5 

2-5 

2-4 

8.4 
6.1 

5-2 

1.6 
1-3 


.04 

•05 
.04 
.04 

.07 
.08 
.08 
.10 

.10 


Brown* 


1 1 


Brown, Quantity 
Brown, Quantity 
Brown, Quality... 
Brown, Quality... 

Starch (No. 7)... . 
Starch (No. 6)... . 


.18 
.16 

.24 
•30 

.08 
.08 



•Brown I is The Long Slide; Brown II, A Morning Adventure. 

It appears that a scorer is not always consistent with respect 
to his constant error. In Table I, Scorer Y and Scorer K show neg- 
ative differences for two sets of papers and a positive difference for 
a third set. The same condition is exhibited by Scorer P and Scorer 
C in Table II. This reversal of policy may be due in part to differ- 
ences in the character of the reproductions, but, doubdess, the in- 
stability of subjective judgment is also a factor. 

In the column headed "r^g", the coefficient of correlation be- 
tween the two sets of scores is given. In the next column the proba- 
ble error of estimate is given. This was calculated by the formula,^^ 
P. E. Est.,. =.6745 (J^/\^^ 

2iThe probable error of estimate for two sets of related data is given by the formula 
P. E.Estii = -6745 CTi \/ I — r?2 (See Yule, Introduction to the Theory of Statistics, 
Page 177.) In this formula r,j is the coefficient of correlation between the two sets 
of data and CTj is the standard deviation of the corresponding distribution. The 
probable error of estimate for the first set of scores (P. E. Est.i) is a measure of the 
amount of change which would be necessary to bring these scores into perfect corre- 
lation with the other set of scores. Professor T. L. Kelley has shown that the corre- 
lation between one set of obtained scores and the corresponding true scores is given 
by the formula, rit = l/r,,. Therefore, the formula, P. E. Est.it =.6745 Cil/i— r,, 
gives the probable error of estimate of the first set of scores with respect to the cor- 
responding set of true scores. A similar formula would give the probable error of 
estimate for the other set of scores. Since both sets of scores were assigned to the 
same set of reproductions, the best measure is the average of the two formulae. Hence, 
<T is the average of (Tj and C,. 



22 



As used here the probable error of estimate should be inter- 
preted as a description of the magnitude of the variable errors or 
departures of the assigned scores from, the corresponding true scores 
after the constant error has been eliminated. We may, therefore, 
speak of the probable error of estimate in this case as the probable 
variable error of scoring. A probable variable error of scoring of 3.4 
means that, in general, the variable errors for the two scorers from 
whom the data were obtained are greater than 3.4 for fifty percent ot 
the scores. It also means that for fifty percent of the scores the varia- 
ble errors are less than 3.4. 

The probable variable error of scoring cannot be given a definite 
significance except in comparison with the magnitude of the score with 
which it is to be associated. A probable error of 5 does not have 
the same meaning when associated with a score whose magnitude 
is 25 as it has when associated with a score of 100. It is, therefore, 
necessary to compare the probable variable error of scoring with the 
magnitude of the scores with which it is associated. The same de- 
gree of objectivity will result in larger variable errors of scoring for 
large scores than for small scores. Since the probable variable error 
of scoring which we have obtained is, itself, an "average" it may 
consistently be compared with the average score. This has been 
done in obtaining the quantities given in the last column of the 
table. The probable variable error of scoring has been divided by 
the average score. A quotient of .06 is to be interpreted as mean- 
ing that the chances are one to one that the score assigned to a paper 
will diff^er from the true score by as much as six percent of its mag- 
nitude. 

In both tables, the coefficients of correlation are high in the 
sense that most of them diff'er only slightly from i.oo. With the 
exception of coefficients for "quality of reproduction" and "quantity 
of reproduction" of Brown's Silent Reading Test, only one is below 
.83. A number are above .90. There are four coefficients of .97. 
One is .98. With three exceptions, the number of cases on which 
these coefficients are based is sufficiently large so that the probable 
error of the coefficient of correlation due to sampling is relatively 
small. The description of the variable errors of scoring in terms of 
the probable variable error of scoring and the ratio of the probable 
variable error of scoring to the average suggest that these errors are 
much larger than might be concluded from a consideration of the co- 
efficients of correlation. For example, in Table I the highest coefficient 
of correlation is .98 for the second form of the Experimental Repro- 

23 



duction Test in the fourth grade. The probable variable error of scor- 
ing is 2.4 units, which is six percent of the average score. This 
means that, in general, the chances are one to one that the score 
assigned to a pupil's reproduction in this group of papers will differ 
by at least six percent of its magnitude from the true score. This 
is the effect, only, of the variable error of scoring. The actual error 
of a pupil's score may be larger, due to the effect of the constant 
error on the part of the scorer. 

It should also be noted that the highest coefficient of correlation 
is not always paired with the lowest ratio of the probable error of 
scoring to the average. In Table II, a ratio of .04 is obtained for 
three tests. The coefficients of correlation for these are .95, .89, and 
.85. In Table I, there are four ratios of .06. The corresponding 
coefficients of correlation are .89, .97, .98, and .96. The lowest ratio^ 
.03, is associated with a coefficient of .90. Comparisons between 
the coefficients of correlation and the probable variable errors of 
scoring, likewise, show many cases of non-agreement. In Table I, 
the largest probable variable error, 9.2, corresponds to a coefficient 
of correlation of .96. The lowest coefficient of correlation, ."TJ, cor- 
responds to a probable variable error of 5.5. The smallest proba- 
ble variable error, 2.1, corresponds to a coefficient of .97. This lack 
of agreement is due largely to differences in the magnitude of the 
scores. 

The scoring of Brown's Silent Reading Test for quality and 
quantity of reproduction clearly involves the largest variable error. 
This is indicated both by the coefficient of correlation and by the 
probable variable error of scoring. If we exclude from our consid- 
eration these two scores of Brown's test, neither the idea-counting 
method nor the word-counting method is distinctly superior. In 
general, the word-counting method appears to involve a slightly 
smaller variable error when this error is considered in relation to 
the average score. However, both methods must be described as 
highly subjective. They Involve a probable variable error of Scoring 
of .06 or more in addition to a constant error which, in some cases, 
is probably large. 

The scoring of Brown's test appears to be somewhat less ob- 
jective than that of the others. This is especially true in the case 
of the word-counting method. In addition to the variable errors, 
this method appears to introduce a large constant error. The scores,, 
"quantity of reproduction" and "quality of reproduction," which 
Brown recommends, are clearly less objective than the scores ob- 

24 



tained by either of the other methods. In fact, they are so highly 
subjective that their use cannot be defended. 

Snmmary for describing reproductions. The description of re- 
productions involves large errors, both constant and variable. Even 
when the scoring is done under careful supervision reliable scores 
cannot be expected. For this reason, alone, silent reading tests re- 
quiring reproduction cannot be considered satisfactory. The method 
which Brown recommends for scoring reproductions appears to be 
inferior to both the word-counting method and the idea-counting 
method. 

Scoring answers to questions. The scoring of the answers to 
the questions in the case of the Experimental Reproduction Tests 
and Fordyce's test is not perfectly objective unless an elaborate list 
of acceptable answers is prepared. This was done for both of these 
tests and, consequently, the scores used in this study may be con- 
sidered objective in the sense that the scoring approximated uni- 
formity. These tests, however, should not be considered as being 
perfectly objective when used independently by different persons 
who do not have access to elaborate directions for scoring. 

Describing the quality of compositions. The scoring of the com- 
positions for story value by means of the Willing Scale for Written 
Composition is not highly objective. Eighty-six compositions were 
scored independently by two persons. The difference between the 
averages of the two sets of scores was 6.7. The coefficient of corre- 
lation between the two sets of scores was .86. The probable variable 
error of scoring was 2.9 and the ratio of this to the average was .04. 
The magnitude of the variable error of scoring indicated by the prob- 
able error and its ratio to the average is less than that involved in 
either method of scoring the reproductions. 

Time required for scoring test papers. All scorers kept a record 
of the time devoted to scoring the different tests. As we have in- 
dicated, care was exercised in the scoring and this probably tended 
to increase the time consumed. Furthermore, in the scoring of re- 
productions the procedure followed was not the most economical 
one. The average number of papers scored per hour is given m 
Table III. The most rapid scoring was done in the case of the 
questions of the Experimental Reproduction Tests. The scoring 
was nearly as rapid for Monroe's Standardized Silent Reading Tests 
and for the Pressey Test. The scoring of the tests requiring repro- 
ductions was relatively slow except in the case of Starch's Silent 
Reading Tests for ideas. 

25 



Average scores and standard deviations. In Tables IV and V, 
the average scores and standard deviations are given for each of the 
tests in each grade. The averages for the comprehension scores in- 
dicate that widely different units are used in describing the per- 
formances on the different tests. In the fourth grade the averages 
range from 6.2 for one method of scoring the Cross-Out Test to 87, 
the average index of comprehension yielded by the Courtis Silent 
Reading Test, No. 2. Even in the case of tests for which the unit 
is given the same name we have differences in magnitude. For ex- 
ample, the word is used as a unit in describing the reproductions. 
The average scores for tests requiring reproduction differ widely 
for the same pupils. In the seventh grade the average score for 
Form I of Starch's test is 40; for the Experimental Reproduction 
Test it is 155. The conditions under which these two tests are ad- 
ministered are not the same and this is, doubtless, one factor which 
causes the difference in the scores. Differences in the difficulty of 
the tests also tend to produce differences in the average scores. It 
is, however, likely that the units are not equivalent in the two cases. 
At least, they do not have equivalent interpretations when used as 
measures of comprehension. 



TABLE III. AVERAGE NUMBER OF PAPERS SCORED 
PER HOUR 



Test 


Method of 
Scoring 


Grade 




IV 


VII 




Usual 

Usual 

Word 
Idea 

Word 
Idea 

Word 

Idea 

Question 

Usual 

Usual 

Usual 

Usual 

Usual 


48 
26 
15 

60 
21 

27 

20 


S3 


Courtis 




Brown 


_ 


Brown 


— 




18 




43 


Reproduction 

Reproduction 

Reproduction .... 

Cross-Out 


II 

8 

56 

39 




28 




47 


Vocabulary 

Composition 


16 
13 



26 



TABLE IV. AVERAGE SCORES AND STANDARD DEVIATIONS FOR MEASURES 
OF COMPREHENSION 



Test 



Grade IV 



Form I 



Av. 



Form II 



Av. 



Grade VII 



Form I 



Av. 



Form II 



Av. 



Monroe . 



Courtis, Index 

Courtis, Question 

Courtis, Questions Coirect 



133 



84.3 
36 



Brown, Quantity.. 
Brown, Quality. . . 
Brown, Average. . 
Brown, Efficiency. 
Brown, Words. . . . 
Brown, Ideas 



Starch, Words. 
Starch, Ideas. . 



Reproduction, Question... 

Reproduction, Ideas 

Reproduction, Words. ... 



Cross-Out, C — W. 

Cross-Out, . 

C+0 



Fordyce. 
Pressey. . 



Memory, Ideas. . 
Memory, Words. 



Vocabulary. . 
Composition. 



10.3 

20.8 
54.5 

6.2 

42.2 

62.5 



26.9 

77-7 

45,0 



6.2 

16.2 

II-3 
9.2 

20.9 
17.6 
18. 1 
35-3 
23-4 
IC.5 



2.8 
II .1 

30.5 

29.9 
159 



6.8 
19.6 

17. 1 



15-5 



91 

17.8 
40.2 

8.5 
43-4 



21.3 
76.0 



5-1 

14.2 
10.4 
II .0 

14-5 
II. 7 
12.2 
19.9 
21.5 
8.0 



2.3 
10.4 

25.7 

5-9 

27.4 



3-9 
134 



23 -9 



29 -3 



9.7 



40.9 
16.9 



65 
155 

16 

67 



72.3 
139 

36.2 
104 -3 

63 -4 

67.2 



18.3 
8.7 

2.7 

33-5 

82.2 

7-1 

22.5 

17 

3 

7-4 
19 " 



38.1 
18.7 

9-9 

42.5 
104.2 

18.1 
69.5 



14. 1 

27.9 
91 .0 



22.3 
9.6 

3-0 
23.6 
54-3 

8.2 

23 -7 



3-2 

3-8 
I3-I 



The non-equivalence of units is even more obvious in the case 
of the average rate scores. In four of the tests the pupil is engaged 
in continuous reading: Courtis Silent Reading Test, No. 2, Brown's 
Silent Reading Test, Starch's Silent Reading Tests, and the Experi- 
mental Reproduction Tests. The average rate scores for these tests 
exhibit differences sufficiently large to indicate that a word is not 
a constant unit for the measurement of the rate of reading. For ex- 
ample, the rate score for Form 3 of the Courtis Silent Reading Test 
is 153 words per minute. For Brown's Silent Reading Test the rate 
is 182 words per minute. Similar differences are to be found in the 



27 



TABLE V. AVERAGE SCORES AND STANDARD DEVIATIONS FOR MEASURES 

OF RATE 



Test 



Grade IV 



Form I 



Av. 



Form II 



Av. 



Grade VII 



Form I 



Av. 



Form II 



Av. 



Monroe 

Courtis 

Brown 

Starch 

Reproduction.. . . 

Cross-Out 

Fordyce, Words. 
Pressey 



79-5 
150.0 
164.6 

151 .6 

75-3 
125 .0 



24.9 
47-9 
60.3 

77-7 
28.2 
27.0 



94-9 
1531 
182.5 

J54-7 
84.0 



21 .1 

55-2 
78.5 

77-1 

22.2 



Composition (Number of 
words written.) 



104.0 



193.0 

218.3 

III .7 

179.0 

24.0 

218.6 



30.9 



56.5 
82.7 
30.1 
35-5 
1-5 

85. 5 



140.7 

202.8 
216.3 
133-8 

23-5 



24.8 



88.9 

70.2 
34-3 

1.7 



seventh grade between Starch's Silent Reading Tests and the Ex- 
perimental Reproduction Tests. 

The rate scores for all of the tests are expressed in terms of words 
per minute. However, in the case of Monroe's Standardized Silent 
Reading Tests, The Cross-Out Silent Reading Tests, and the Pressey 
Silent Reading Test, the pupil does not do continuous reading. He 
must stop frequently to give responses. This, naturally, tends to 
reduce the rate scores. This is clearly shown in Table V. The rate 
scores for these tests are in most cases considerably less than rate 
scores in tests where the pupil does continuous reading. The differ- 
ence is less marked in the seventh grade than in the fourth. 

In Fordyce's Scale for Measuring Achievement in Reading, the 
pupil reads continuously, but the time allowance is such that a ma- 
jority of the pupils complete the reading. Thus, they do not have 
an opportunity to give evidence of their rate of reading. This is the 
principal reason why the average rate scores for Fordyce's Tests are 
smaller than for the other tests in which the pupil does continuous 
reading. 

The standard deviations also exhibit differences. Differences 
in the magnitude of the units would naturally aifect the standard 
deviations as well as the averages. The standard deviation is also 
affected by the shape of the distribution. In a number of cases, the 



distribution of scores does not approximate the normal shape. This 
is, doubtless, one factor affecting the differences between the stand- 
ard deviations. 

Equivalence of duplicate forms The facts given in Tables IV 
and V indicate that the forms of these tests are not equivalent. In 
some cases an effort was made to construct the different forms so 
that they would be equivalent. This is true of Monroe's Standard- 
ized Silent Reading Tests. A study^' planned to determine the de- 
gree of equivalence of these tests has indicated very definitely that 
they are not equivalent. The degree of non-equivalence revealed 
by that study is approximately that which is indicated here. The 
two forms of the Experimental Reproduction Tests, which were 
constructed without any preliminary study to determine their equiv- 
alence, appear to be as nearly equivalent as those of any other 
test in the list, as far as the rate is concerned. In the case of com- 
prehension, there is considerable difference between the average 
scores. The two forms of the Cross-Out Tests were also constructed 
without much regard to equivalence and the average scores differ 
widely in most cases. 

There is no published statement concerning the procedure 
followed by the authors of the other tests in order to secure 
equivalence of the duplicate forms. The average scores for the 
Courtis Silent Reading Test No. 2 do not differ widely. In fact, 
the two forms of this test appear to be the most nearly equiv- 
alent of any of the tests studied. The two Starch tests, No. 6 
and No. 7, were not intended by the author to be equivalent 
forms. No. 7 (Form I) was intended to be more difficult, and 
lower average scores are, therefore, to be expected. This is what 
we find, except for the word-counting method of describing the 
reproductions. It is, however, obvious that it is difficult, or im- 
possible, to construct duplicate forms which will be essentially equiv- 
alent, especially in the case of a small group of pupils. In addition 
to any lack of equivalence which may exist, the practise effect, due to 
one form being given after the other, would tend to produce dif- 
ferences between the average scores. The amount of this practise 
effect was not studied, since it was not pertinent to the major prob- 
lem. 



"Monroe, Walter S. Report of Division of Educational Tests for 1919-20. Uni- 
versity of Illinois Bulletin, Vol. XVIII, No. 21, Page 19. 

29 



Relation of vocabulary to difficulty. In an effort to determine 
whether the vocabulary of a selection tends to determine its diffi- 
culty, the selections read by pupils in tests requiring reproduction 
were analyzed. All the words occurring in each selection were listed 
and the frequency of each one determined. The number of words 
in each selection not occurring in Ayres' list of one thousand words 
was also determined. In the case of the selections which formed 
duplicate tests, the vocabularies were compared, and the number 
of words common to the two selections was found. The results of 
this study are given in Table VI. For the Courtis Silent Reading 
Test, No. 2, 16 percent of the vocabulary in Form 1 and 19 percent 
of the vocabulary in Form 2 are not found in the Ayres' list. The 
number of different words, or the vocabulary, of Form 1 is 37 per- 
cent of the length of the selection. This means that, on the average, 
each word in the selection is used nearly three times. In the case 
of Form 3, the number of different words is 44 percent 
of the total number of words in the selection. The number of words 
common to the two selections is 15 percent of the average number 
of words in the two selections. These facts show that for these two 
forms of the Courtis Silent Reading Test, No. 2, the two selections 
are approximately equivalent with respect to the percent of words 
not found in the Ayres' list. Form 3 contains a slighdy larger 
percent of words not in this list. Such words will, in general, be 
unusual words unless they are proper names. Form 3 has a rela- 

TABLE VI. ANALYSIS OF SELECTIONS READ BY PUPILS IN 
SILENT READING TESTS REQUIRING REPRODUCTION 



Test 



Courtis I 

Courtis III 

Starch, No. 6 

Starch, No. 7 

Brown 

Long Slide 

Morning Adventure. . 

Old English 

Heroes I 

II 

The Strike at Shane's I . 
II 



Words not 

in Ayres' 

list. 



.16 
• 19 

■30 
■31 



■13 
■14 



19 
19 

19 
19 



Different 
words. 



•37 
•44 

•55 
•59 



•37 
•35 



43 
.44 

■44 

•52 



Words 
common to 
both selec- 
tions. 



.15 



•13 

•13 
.13 



30 



cively larger vocabulary, and makes a greater demand upon a pupil's 
acquaintance with words. The percent of words which are common 
to the two selections is surprisingly small in view of the simple char- 
acter of the material and of the fact that the two selections are con- 
sidered equivalent in difficulty. 

A comparison of the facts contained in Table VI with those in 
Tables IV and V indicates that the explanation for the non-equiv- 
alence of the two forms of the same test is not to be found in the 
vocabularies of the two selections in the respective tests. Evidently, 
the difficulty of a selection is determined by some factor other than 
the actual words used. 

Formation of composite scores. The scores yielded by the 
different tests are expressed in terms of different scales. Therefore, 
it is necessary to reduce them to a common scale before combining 
them to form composite scores. The procedure adopted was to 
choose as a base the scale of Monroe's Standardized Silent Read- 
ing Test I, Form i, for the fourth grade and the scale of Test II, 
Form I, for the seventh grade. All other scores were reduced to 
the scale of these tests. The formula for reducing the scores ob- 
tained from one scale to equivalent scores on another scale is as 

follows : (J. fj. 

Si = ^S2+(Avi- — Av2) 
0"2 (T^ 

In this formula, S2 is the obtained score on Form 1 and Si is the 
equivalent score expressed in terms of the scale of Form i . Avi re- 
fers to the average of the scores obtained from Form i ; Av2 refers 
to the average of the scores obtained from Form 1. The standard 
deviation of the distribution of the Form i scores iscTi, and Gt is the 
standard deviation of the distribution of the Form 2 scores. This 
formula is based upon the usual assumption that corresponding 
deviations from averages are equal when expressed in terms of the 
standard deviation of the distribution; in other words, that 
Si - Av i S2 - Av 2 

(Tx ~~ (72 

When this equation is solved for Si we obtain the formula as given 
above. The application of the above formula involves the deter- 
mination of the numerical value of the ratio of — by which the Form 

1 score is to be multiplied and the determination of the numerical 
equivalent of the constant term of the formula (i. e., of the expression 
in parentheses). This latter numerical equivalent may be plus or 

31 



minus. When it is positive it is to be added and when negative it 
is to be subtracted. 

After the scores were reduced to the same scale composite 
scores were formed by calculating the averages of certain groups of 
scores. Composite AI is the average of Monroe, Form i (compre- 
hension), Courtis, Form i (answers correct), and Reproduction, 
Form I (answers correct). (In the seventh grade the Courtis Test 
was not given and this composite score includes only the other two 
tests.) Composite All is obtained from the second form of these 
tests. ^^ Composite BI is the average of Brown's Silent Reading 
Test (both quality and quantity scores), and the Experimental Re- 
production Tests (ideas and words). In the seventh grade, Starch's 
Silent Reading Test-'* (ideas'and words) is used in the place of Brown's 
test. Composite CI is the average of Composite AI and Composite 
BI. Composite BII and CII were obtained in a corresponding way 
from the second forms of these tests. Composite I is obtained by 
combining all Form i scores. Composite II is obtained by combin- 
ing all Form 2 scores. 

Reliability. Since, with the exception of Fordyce's Scale for 
the Measurement of Achievement in Reading, two forms of each 
test were given, it is possible to compute measures of the extent to 
which equivalent scores were yielded by the different forms of a 
test. It is also possible to compute the probable error of measure- 
ment which is a measure of the magnitude of the departuresof the ob- 
tained scores from the corresponding true scores. ^^ These departures 
are the variable errors of measurement. No account is taken of the 
constant error of measurement in the following discussion. In the 
case of the tests for which the scoring is subjective, the computed 
reliability is greater than the true reliability for the reason that the 
averages of two independent scorings were used instead of the scores 
assigned by one person. ^^ 

Methods of determining reliability. In Tables VII and VIII, 
the reliability of these tests is described in terms of four quantities, 
(i) The coefficient of reliability is represented by the symbol, rx2, 
and is the coefficient of correlation between the two sets of scores 
yielded by the two forms of the test. (2) The index of reliability 
is represented by the symbol, n . This quantity is the coefficient 

*^In the case of the Courtis Tests, Form 3 was used instead of Form 2. 
**No. 7 is Form i and No. 6, Form 2. 

*°A true score is defined as the average of the scores yielded by a large num- 
ber of duplicate forms of a test. 

^''See page 17 for the exact method used. 

32 



of correlation between one set of obtained scores and the set of cor- 
responding true scores. The relation between the index of relia- 
bility and the coefficient of reliability is expressed by Vu = V^n- 
This formula was used in calculating the indices of reliability given 
in these two tables. (3) The probable error of measurement is 
represented by the symbol, P.E.m. This q uantity was calculated 
by the formula, P.E.m = .6745 (Ta/i — ri2." 

The probable error of measurement (P. E.m) is a measure of the vari- 
able errors of measurement, or the differences between the obtained 
scores and the corresponding true scores. (4) The ratio of the 
probable error of measurement to the average of the scores from 

P.E.M 

which it was calculated is represented by the symbol, — ' ' . Table 

VII gives information concerning the reliability of rate scores and 
Table VIII, the corresponding information for comprehension scores. 
In case the test was scored by more than one method, the information 
is given for all methods of scoring. 

Probable error of r due to sampling. The coefficients of cor- 
relation, given in Tables VII and VIII and in the following tables, 
are subject to an error of sampling when interpreted with respect 
to the existence of relationship between the two sets of data from 
which they were derived. All of the correlations in the following 
tables are based on 80 cases in the fourth grade and 91 in the seventh 

TABLE VII. MEASURES OF RELIABILITY, RATE 



Test 



Grade IV 



rit 



P.E.M 



P.E.M 



Av. 



Grade VII 



rit 



P.E.M 



P.E.M 



Av. 



Monroe I-II. . 
Monroe I-III. 
Monroe I I-II I 

Courtis 

Brown 

Starch 

Reproduction. 

Cross-Out 

Pressey 



.76 
.64 



.85 



74 



.87 
.80 
.82 

.92 
•93 

.86 



"•3 
13.6 
II. 8 

19-3 
26.0 



39-5 
14.4 



•13 

• 15 
.12 

■13 
•15 

.26 
.18 



.63 
■55 
.69 



.62 

■45 
.76 
■50 



■79 
■74 
•83 



■79 
■67 
.87 
•71 



17.0 
16.6 
12.3 



44.8 

56.6 

15^8 

I .1 



.11 
.11 

.09 



■23 
.26 

13 

•05 



^For explanation of this formula and the method of application see page 24. 

33 



TABLE VIII. MEASURES OF RELIABILITY, COMPREHENSION 



Test 



Grade IV 



P.E.M 



P.E.M 



Av. 



Grade VII 



P.E.M 



P.E.M 



Av. 



Monroe I-II 

Monroe I-III 

Monroe II-III 

Courtis, Index 

Courtis, No. of Questions 
Courtis, No. of Questions 
Correct 

Brown, Quantity 

Brown, Quality 

Brown, Average 

Brown, Efficiency 

Brown, Words 

Brown, Ideas 

Starch, Words 

Starch, Ideas 

Reproduction, Questions, 

Reproduction, Ideas 

Reproduction, Words 

Reproduction, Questions 
and Words 

Cross-Out C-W 

Cross-Out 

C+0 

Pressey 

Memory, Ideas 

Memory, Words 



•54 
■50 

•44 
.62 

■52 



•35 
.40 



76 



• 69 

•73 
■71 

•63 

• 79 

.72 



•59 
•63 



3.0 
3^4 

2.5 

9 9 

4.6 

5-7 

14.2 

13-1 
16.6 
20.1 

13-7 
6.6 



19 
19-5 

4.0 

3-4 
19.9 



4-3 
12.8 



.20 
•37 
• 41 

•30 

•47 
■47 



.18 
■17 



.69 
.60 
.61 



•83 

• 77 
.78 



5^2 
5.6 

5^2 



• 17 



77 
72 

60 

72 
87 

64 

67 
52 

•65 

•56 
•34 



•75 



9^7 
4.8 



1-9 



33 
13 3 



•25 
• 27 

.20 
•15 
.19 



.26 
.21 



14 



.10 
.13 



grade. In order to economize space, we give in Table IX probable 
errors due to sampling for various values of r. Most of the coeffic- 
ients of correlation appearing in these tables are sufficiently large 
in comparison with the probable error due to sampling that they 
may be interpreted as indicating the existence of a distinct positive 
relationship. We are, however, more interested in securing a meas- 
ure of the departure from perfect correlation. Hence, the probable 
error of measurement (P.E.m) is a much better index of the degree 
of reliability of a test than either ri2 or ru. 

Reliability of the tests studied. Brown's Silent Reading Test,^ 
when scored in the way which he recommends, is the least reliable. 
The ratio of the probable error of measurement to the average is .54 
for the quality and .55 for the average of quantity and quality. 



34 



TABLE IX. PROBABLE ERRORS OF THE COEFFICIENT OF CORRELATION (r,a) 
DUE TO USING A LIMITED NUMBER OF CASES* 



ri2 


P. E. 


.1 


0798 




2 


0774 




3 

4 
5 
6 

65 


0734 
0677 
0605 
0516 
0466 




70 


0411 




75 
80 

85 


0353 

0290 
0224 




90 


0153 




95 


0079 



*8o in Grade IV and 91 in Grade VII. 

The "efficiency score" has a ratio of .43. The scoring of this test 
by means of either the idea-counting method or the word-counting 
method results in scores that are more reliable. Considering both 
rate and comprehension, the most reliable test is the Courtis Silent 
Reading Test, No. 2. For rate, the index of reliability is .92 and the 
ratio of the probable error of measurement to the average is .13. 
Three comprehension scores are used in connection with this test. 
The number of questions answered is shown to be the most reliable. 
The probable error of measurement and the ratio of the probable 
error of measurement to the average score indicate a degree of 
reliability for the rate scores yielded by Monroe's Standardized 
Silent Reading Tests which is surprisingly high, considering the 
character of the tests. This is particularly true in the seventh 
grade. With the exception of Pressey's Silent Reading Test, they 
are the most reliable. In the fourth grade, the reliability is exceeded 
only by Courtis's Silent Reading Test, No. 2. In Monroe's Stand- 
ardized Silent Reading Tests a pupil does not read continuously but 
is forced to stop at the end of each exercise and answer a question. 
According to the rules for scoring these tests, a pupil receives no 
credit for an exercise unless he has completed his reading of it to the 
extent of recording his answer. The increments added to a rate 
score for doing additional exercises are relatively large, particularly 
in Test II. Thus, a pupil who has failed only in recording his an- 
swer to an exercise receives a score which does not indicate his rate 
of reading. His score is the same as that of the pupil who has just 
barely completed the preceding exercise. In all of the other tests 
with the exception of Pressey's Silent Reading Test, the pupil's 
rate score represents the actual amount read. In view of these facts 

35 



it is surprising to find that Monroe's Standardized Silent Reading 
Tests yield rate scores which have such a high degree of reliability. 
The figures which are given may be affected somewhat by the fact 
that these tests proved too short and a considerable number of pu- 
pils made perfect scores. 

In general, the degree of reliability is higher in the seventh 
grade than in the fourth. Exact comparisons cannot be made be- 
cause identical tests were not given in the two grades; but where 
similar tests were given the results for the seventh grade show a dis- 
tinctly higher degree of reliability. This may be due to a superior- 
ity in the tests for the seventh grade or it may be due to the fact that 
the increased maturity of the pupils causes them to be less variable 
in their performances. 

The degree of unreliability shown in Tables VII and VIII is 
distressingly high. As we have indicated, the ratio of the probable 
error of measurement to the average probably furnishes the most 
significant statement of the degree of unreliability. Brown's Test, 
scored by any method, appears to be so highly unreliable that it 
should be rejected. In interpreting the figures in Table VIII it 
should be borne in mind that the actual degree of unreliability is some- 
what larger than that indicated because the element of subjectivity 
in scoring has been largely eliminated. It appears that individual 
scores yielded by these tests are very imperfect measures of reading 
ability. However, the variable errors involved do not affect, to the 
same degree, the scores of classes or larger groups. Although the 
scores yielded by these tests must be considered as having only a 
very limited significance in the case of individual pupils, they are 
much more significant for groups of pupils. 

Both the Experimental Reproduction Tests and the Cross-Out 
Tests were merely experimental. The reproduction tests were in- 
tentionally so. It was desired to ascertain whether a crude repro- 
duction test, such as might be constructed by a teacher and ad- 
ministered directly from a supplementary reader, would yield results 
as reliable as tests more carefully constructed and more conveniently 
arranged. These tests are shown to be among the least reliable, 
with the exception of Brown's Silent Reading Test. This is to be 
expected; but the difference in reliability, particularly in the seventh 
grade, is not marked. In fact, the Experimental Reproduction 
Tests exhibit a relatively high degree of reliability in the measure- 
ment of comprehension. Thus, the reliability of a crude test of 

36 



this type is only slightly less than that of tests whose construction 
was more refined. 

Discrimination. The distributions of the rate scores yielded by 
the different tests indicate that certain tests fail to yield scores which 
discriminate between a number of pupils with respect to rate of 
reading. Form 3 of Monroe's Standardized Silent Reading Tests 
I and II is clearly too short. In the seventh grade 58 percent of the 
pupils and in the fourth grade 27 percent completed the test. All 
such pupils received the maximum rate score. The distributions; 
for Forms i and 2 of this test contain no such extreme deviations 
from the normal shape, although Form 2 of Test I and Form i of 
Test II cannot be said to approximate closely the normal distri- 
bution. 

The Cross-Out Tests yield distributions which exhibit many 
irregularities and which cannot be said to do more than suggest 
the normal distribution. As was to be expected, a large percent of 
the pupils completed the reading of ♦•he selection in the case of the 
Fordyce test. Forty-nine percent of the pupils in the fourth grade 
and 29 percent in the seventh grade received the maximum rate 
score. The Pressey Test proved too short for the time allowed. 
Seventy-six percent completed Form i and 56 percent completed 
Form 2. The Courtis, Brown, Starch, and Experimental Repro- 
duction Tests yielded rate scores which formed distributions closely 
approximating the normal shape. A few irregularities were exhibited 
by the Experimental Reproduction Tests and by Brown's test. 

As judged by the shape of the distribution of the rate scores> 
the Courtis Silent Reading Test, No. 2, exhibits the least lack of 
discrimination. The Cross-Out, Pressey, Fordyce, and Form 3 of 
Monroe's tests exhibit such great departures from the normal dis- 
tribution that they must, obviously, fail to discriminate properly 
with respect to the rate of reading for a considerable number of 
pupils. 

In the case of comprehension, the distributions of scores for 
Monroe's Standardized Silent Reading Tests closely approximate 
the normal. The third form appears to have been a little too easy;, 
but, in other respects, the irregularities exhibited by the distribu- 
tions cannot be considered to indicate a serious lack of discrimina- 
tion. The index of comprehension for the Courtis Silent Reading 
Test, No. 2, fails to discriminate properly between a number of pupils. 
Both the number of questions answered and the number of questions 
answered correctly approach more nearly the normal distribution.. 

37 



TABLE X. CORRELATIONS WITH TEACHER RATING 



Test 



Rate 



Grade IV Grade VII 



Comprehension 



Grade IV 



Grade VII 



Monroe I. . . 
Monroe II. . 
Monroe III. 



Court 
Court 
Court 
Court 
Court: 
Court 
Court 
Court 



s I Index 

s I Questions 

sl Questions Correct., 
s I Words per minute. . 

sIII Index. 

s III Questions 

s III Questions Correct, 
s III Words per minute. 



.38 
.34 
• 43 



•SI 



Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 
Brown 



I Quantity 

I Quality 

I Average 

I Efficiency 

I Words 

I Ideas 

I Words per minute. 

II Quantity 

II Quality 

II Average 

II Efficiency 

II Words 

II Ideas 

II Words per minute. 



Starch I Words 

Starch I Ideas 

Starch I Words per minute. 

Starch II Words 

Starch II Ideas 

Starch II Words per minute. 



Reproduction I Questions 

Reproduction I Ideas 

Reproduction I Words 

Reproduction I Words per minute. 

Reproduction II Questions 

Reproduction II Ideas 

Reproduction II Words 

Reproduction II Words per minute. 



Cross-Out I 



Cross-Out I C-W 

C-W 

C+0 

Cross-Out I Words per minute. 
Cross-Out II C-W 

C-W 
Cross-Out II 



c+o 

Cross-Out II Words per minute. 



Fordyce. 



Pressey I. . 
Pressey II. 



Composite AI . . 
Composite All. 
Composite BI. . 
Composite BII. 
Composite CI. . 
Composite CII. 
Composite I. . . 
Composite II. . 



.36 



.32 



.36 



. 19 

• 41 

.26 



■ 55 
• Sl 



.29 

.08 



.60 
.64 
.63 

■ 29 
. 29 

• 41 

• 45 

• 38 
•SI 



.58 
.60 
.44 

• 34 

• 59 

• 56 



•32 
.50 
• 39 



.46 

• 46 

• 51 

.49 



• 34 

• 47 

• 23 

• SI 

• 49 
.46 



. 21 
.27 



.46 

• 37 

.40 

• 55 
•S3 
.58 
.51 
.63 

• 58 

• 58 



38 



This is particularly true of the latter. The distributions for the 
Brown, Starch, and Experimental Reproduction Tests exhibit many- 
irregularities; but there is in all cases a distinct resemblance to the 
normal distribution. A few of the distributions approach very 
closely the normal one. Others contain rather marked departures 
from it. In the case of Brown's test, the distributions for the quality 
scores exhibit greater departures thant he distributions for the quan- 
tity scores. 

Comparison with teachers' ratings. All scores, both rate and 
comprehension, were correlated with the ratings in silent reading 
given by the teacher. The coefficients of correlation were cal- 
culated, also, for certain composite scores. These coefficients of 
correlation are given in Table X. With the exception of one coeffi- 
ient for the second form of Brown's test, all coefficients are positive 
and in general sufficiently large to indicate a distinct positive re- 
lationship between the test scores and the teachers' ratings. Rate 
of reading correlates more highly with the teachers' rating in the 
fourth grade than in the seventh. For rate, the average of the 
coefficients, not including the composite scores, is 43 in the fourth 
grade and 26 in the seventh. The average of the coefficients for 
comprehension, not including the composite scores, is 40 in the 
fourth grade and 44 in the seventh. 

In the fourth grade, comprehension, as measured by Monroe's 
Standardized Silent Reading Tests, correlates most highly with the 
teachers' ratings. In fact, the coefficients for the three forms of 
this test equal or exceed all of those for the composite scores. In the 
seventh grade this test does not exhibit as high correlations with 
teachers' ratings. Neither do its rate scores correlate as highly 
with teachers' ratings as the rate scores yielded by some other tests. 
It is interesting to note that the correlation between the second form 
of Brown's Test for "quantity of reproduction" and "quality of 
reproduction" is essentially zero. For Form i the correlations for 
these two scores are lower than the correlations for any other scores. 
This suggests that Brown's method for scoring his test is undesirable. 
The correlations of the composite scores with teachers' ratings in- 
dicate that, in the fourth grade, teachers judge silent reading ability 
more on the basis of the pupils' ability to answer questions than of 
their ability to reproduce. In the seventh grade, the teachers give 
greater weight to the pupils' ability to reproduce or to tell what has 

been read. 

Correlation of comprehension with memory. In those tests 
which require the pupil to answer questions from memory or to 

39 



TABLE Xr. CORRELATION OF COMPREHENSION WITH MEMORY 



Test 



Brown I Quantity 

Brown II Quantity 

Brown I Quality 

Brown II Quality 

Starch I Ideas 

Starch II Ideas 

Starch I Words 

Starch II Words 

Reproduction I Questions 
Reproduction II Questions 

Reproduction I Ideas 

Reproduction II Ideas 

Reproduction I Words 

Reproduction II Words. . . , 

Monroe I 

Monroe II 

Monroe III 

Maximum 

Minimum 

Average 



Grade IV 



Ideas 



•32 

.27 

.36 
• 19 



II 



29 



Words 



■ 39 

.23 

.36 
• 14 



II 



28 



Grade VII 



Ideas 



II 



Words 



II 



•31 

•25 

■47 
•34 

.26 
.20 

.36 
•35 

■33 
■39 

•35 
•24 
.26 

■47 
.20 

• 32 



TABLE XII. CORRECTED COEFFICIENTS OF CORRELATION OF 
COMPREHENSION WITH MEMORY 



Test 



Grade IV 



Ideas 



Words 



Grade VII 



Ideas 



Words 



Brown Quantity 

Brown Quality 

Starch Ideas 

Starch Words 

Reproduction Questions 
Reproduction Ideas. . . . 
Reproduction Words. . . 

Monroe I-II 

Monroe I-III 

Monroe II-III 



.67 
.68 



.66 
■54 



40 



reproduce the passage read, it would seem that a pupil's ability to 
remember would materially affect his comprehension score. In 
order to ascertain the extent to which ability to remember does affect 
the comprehension score yielded by such tests, the pupils were given 
the memory test^^ described on page 7. In this test a selection was 
read to the pupils and they were asked to reproduce the story from 
memory. The coefficients of correlation between the memory 
scores and the comprehension scores for silent reading tests are given 
in Table XI. It is significant that none of these coefficients are 
large. The first three tests listed in this table require the pupil to 
give his pe rformances from memory. Monroe's Standardized Silent 
Reading Tests do not appear to make any considerable demand 
upon the pupil's memory; he has the passage before him and can read 
it and re-read it if he desires. If any memory is involved it is im- 
mediate in character. It is significant that the coefficients of cor- 
relation for this test closely approximate those for other tests. 

Corrected coefficients of correlation. The measures yielded 
by these tests involve variable errors. It has been shown in our 
consideration of the reliability of these tests that these errors are 
relatively large for the reproduction tests. The presence of these 
variable errors tends to reduce the coefficients of correlation, and it 
is possible that the coefficients of correlation given in Table XI do 
not represent the true relation between comprehension and memory. 

When two forms of both tests have been given to the same pupils 
it is possible to compute a corrected coefficient of correlation which 
is free from the effect of the variable errors of measurement. This 
has been done by means of the following formula :2^ 



's/(rpiqj) (rpiqi) 
\/(tpiPi) (rqiqs) 

rpq here indicates the true correlation between two series of measures, 

p and q, of the facts A and B. 
Pi and P2 are two independent measures of A. 
qi and q2 are two independent measures of B. 
rpiq.is the correlation obtained from the first measure of A and the 

second measure of B. 
rpiqi is the correlation obtained from the second measure of A and 

the first measure of B. 



^*It is assumed that this test measures ability to remember. 

"Thorndike, E. L. "An Introduction to Mental and Social Measurements." 
New York. Teachers College, Columbia University, 1916. Page 179. 

41 



rpiPs is the correlation between the two measures of A. 
rqiq2 is the correlation between the two measures of B. 

In applying this formula the factors of the numerator are ob- 
tained from Table XI. For example, in calculating the corrected 
coefficient of correlation for Brown's Silent Reading Test with 
memory, rpiqj is the coefficient of correlation of Brown I with Mem- 
ory II. This is given as .21. The coefficient of correlation of Brown 
II with Memory I, is rptqi. This is given as .27. The factors of the 
denominator are the reliability coefficients of the two tests. These 
are to be found in Table VIII. They are .36 for Brown's Silent 
Reading Tests and .35 for the Memory Tests. Substituting these 
values in the formula, 

V-2I X .27 

Tpq = 



V-36 X .35 

= V.45 

= .67 
This is the first entry of the first column of Table XII. 

A study of the corrected coefficients given in Table XII indi- 
cates that, in the case of the Experimental Reproduction Tests in 
the fourth grade, the correlation between Memory and the scores 
based upon the pupil's reproduction is very high. For ideas it is 
.97. For words it is .88. For Brown's Silent Reading Tests the 
correlation is not as high. In fact, it closely approximates that for 
Monroe's Standardized Silent Reading Tests. In the seventh grade 
the correlation of Memory with Monroe's Standardized Silent Read- 
ing Tests is higher than that for either Starch or the Experimental 
Reproduction Tests, although the difference is not marked in the 
case of the latter. It, therefore, appears that in the seventh grade 
memory is not a major factor in determining the comprehension 
scores of tests which require reproduction unless it is also the de- 
termining factor in the case of tests which do not appear to involve 
memory. The statement which has been made with reference to 
reproduction tests, that they measure the ability to read ayjd re- 
member^ does not appear to be justified by the facts which are pre- 
sented here. 

Correlation of comprehension with vocabulary. In Table XIII, we 
give the coefficients of correlation between the comprehension scores 
and the scores obtained from the vocabulary test. In the fourth 
grade most of the coefficients are negative, but all of them cluster 
closely around zero. This means that, measured by the tests used, 

42 



TABLE XIII. 



COEFFICIENTS OF CORRELATION BETWEEN VOCABULARY AND 
COMPREHENSION 



Test 



Monroe I . . . 
Monroe II. . 
Monroe III. 



Courtis I Index 

Courtis I No. of Questions. . 
Courtis I Questions Correct. 

Courtis III Index 

Courtis III No. of Questions. . 
Courtis III Questions Correct. 



Starch I Words . 
Starch I Ideas. . 
Starch II Words. 
Starch II Ideas. . 



Brown I Quantity. 
Brown I Quality. . 
Brown I Average. . 
Brown I Words. . . 
Brown I Ideas. . . . 



Brown II Quantity. 
Brown II Quality. . 
Brown II Average. 
Brown II Words. . . 
Brown II Ideas 



Reproduction I Questions. 

Reproduction I Ideas 

Reproduction I Words. . . . 
Reproduction I Questions. 

Reproduction I Ideas 

Reproduction I Words 



Cross-Out 
Cross-Out 




Cross-Out II C-W. 
Cross-Out II C-W . 
C+0 

Fordyce 



Pressey I . 
Pressey II . 



Composite AI. . 
Composite All. 
Composite BI. . 
Composite BII. 
Composite CI. . 
Composite CII. 
Composite I. . . 
Composite II. . 



Grade IV Grade VII 



.02 

-•03 

-.02 

-.20 

• 19 
.10 

-.20 
.06 
■15 



-.11 
-.12 

•14 
.01 

-.04 

-•23 
-.21 
-.16 

-15 
-.19 

-•15 
-.10 
-.09 
.12 
-.04 
-.04 

-.07 
-.05 



.09 
.02 



.04 



.22 
.22 
•13 



•31 
•31 
.29 
.22 



•14 
•17 
•13 
•19 
•24 
.26 

.18 
.08 



.16 

.01 



•13 



— 


.21 





.00 


-.02 


■23 


.01 


.20 


-.08 


•32 


-.20 


.28 


-.05 


.26 


-13 


•25 




•30 




.21 



43 



there is no relation between a pupil's vocabulary and his ability to 
read. It is, of course, obvious that, in order to read, a pupil must 
be acquainted with words. It is, therefore, impossible to believe 
that vocabulary is not a factor in the reading process. The facts 
presented here probably mean that, in the fourth grade, vocabulary 
is not a determining factor and the pupil's ability to read depends 
primarily upon abilities other than the extent of his acquaintance 
with words. In the seventh grade the coefficients are all positive 
but none of them are large. This probably means that, in the sev- 
enth grade, vocabulary is a minor factor in determining the pupil's 
comprehension. It is, of course, possible that the vocabulary test 
used does not measure the extent of a pupil's acquaintance with 
words. 

Correlation of cancellation scores with measures of rate of reading. 
In Table XIV, the coefficients of correlation for the scores yielded 
by the Cancellation Test with measures of rate of silent reading are 
given. With few exceptions, these coeificients are positive but small. 
In general, they are slightly smaller in the seventh grade than in 
the fourth grade. In most cases, there does not seem to be any 
marked relationship between ability to do the Cancellation Test and 
the rate of silent reading. One might expect a distinct positive re- 
lationship between the Cross-Out Silent Reading Tests and the 
Pressey Silent Reading Tests. It does, however, appear that the 
relationship which exists with respect to these tests is greater than 
that which exists for Monroe's Silent Reading Tests. 

The table also includes coefficients of correlation for the scores 
yielded by the Cancellation Test with the comprehension scores 
yielded by the Cross-Out Tests. The coefficients are, likewise, small, 
two of them being slightly negative. It appears, therefore, that the 
ability to strike out letters from words is not related to the ability 
called for by the Cross-Out Tests. 

Correlation of comprehension with written composition. An- 
other measure of a pupil's vocabulary is secured from his written 
composition. The pupils in the seventh grade were asked to write a 
composition on an exciting experience. (See page 10.) In Table 
XV, we give the coefficients of correlation between measures of com- 
prehension and two measures of these written compositions, the 
number of words written and the story value. The number of words 
which a pupil writes in such an exercise is, undoubtedly, an index 
of his writing vocabulary. It is, of course, possible that his writing 

44 



TABLE XIV. CORRELATION OF CANCELLATION SCORES WITH MEASURES 
OF RATE OF READING AND WITH THE CROSS-OUT TESTS 



Test 



Grade IV 



Cancellation 



IV 



Grade VII 



Cancellation 



II 



Monroe I . . . 
Monroe II. . 
Monroe III. 

Courtis I . . . 
Courtis III. 



Brown I . . 
Brown II . 

Starch I . . 
Starch II. 



Reproduction I . . 
Reproduction II. 



Cross-Out I . . 
Cross-Out II. 



Fordyce, No. of Words . 



Pressey I . . 
Pressey II . 



Cross-Out I C-W. 

Cross-Out I S=^. 
C-l-0 

Cross-Out II C-W. 
C-W 



Cross-Out II 



C-hO 



.28 
.26 
■23 

.12 

.20 

■25 
• 07 



.22 
■13 

.20 
•23 

•30 



.08 
.02 

•17 
•14 



• 14 

• 15 

.20 

• 15 

• 14 

.16 

.08 



13 



■ 15 
■13 



•07 
.10 

■03 



.20 
.18 
.22 



.01 
•03 
.06 

■03 



06 


-.01 


15 


•03 


OS 
10 


•25 
.22 


.18 
■14 


•25 
■33 


.08 


.11 


■03 
.06 


.11 

•15 


.21 


.18 


.11 


•05 


.16 


•17 


.11 


-.01 



*In Cancellation Test I, the words containing both "a" and "t" were marked; 
in Test II, those containing both "e" and "r." 

vocabulary and his reading vocabulary are not closely related. The 
coefficients of correlation, in Table XV, show that there is little or 
no relation existing between measures of comprehension and the 
number of words which were written in these compositions. Even in 
the case of comprehension scores based upon the number of words 
and the number of ideas contained in reproductions, the coefficients 
of correlation fail to indicate the existence of any marked relation- 
ship. In fact, the coefficients of correlation for measures of com- 
prehension gained through reproduction are lower, in most cases, 
than the coefficients of correlation of the number of words written 
with the comprehension scores derived from Monroe's Standardized 
Silent Reading Tests. 

45 



A higher degree of correlation is indicated between the *'story 
value" and the measures of comprehension. Some of the coefficients 
of correlation are sufficiently large to indicate a distinct positive re- 
lationship between these two traits. It is not unlikely that this re- 
lationship can be explained in terms of a common general factor, 
such as general intelligence. 

Inter-correlation between tests. Since in each grade all of the 
tests were given to the same pupils, it is possible to calculate the 
coefficients of correlation between scores yielded by the different 
tests. These are given in the appendix. The magnitude of the co- 
efficients of correlation is influenced by the reliability of the scores 
and, therefore, does not truthfully reflect the relationship which 
exists between the scores yielded by the different tests. In order 
to secure more accurate indices of the relationship existing between 
traits measured by the different tests, the corrected coefficients of 
correlation have been calculated by means of the formula given on 
page 41. Since the factors of both numerator and denominator of the 
formula are square roots, it is impossible to calculate corrected co- 
efficients when one of the raw coefficients is negative. This 

.TABLE XV. CORRELATION OF COMPREHENSION WITH WRITTEN 
COMPOSITION, SEVENTH GRADE, 9O PUPILS 



Test 

Monroe I 

Monroe II 

Monroe III 

Starch I Words 

Starch I Ideas 

Starch II Words 

Starch II Ideas 

Reproduction I Questions 

Reproduction I Ideas 

Reproduction I Words 

Reproduction II Questions 

Reproduction II Ideas 

Reproduction II Words 

Cross-Out I C-W 

r n . T C-W 

Cross-Uut 1 — — - 

C+0 

Cross-Out II C-W 

Cross-Out II — — 

C+0 

Fordyce Percent 

Pressey I 

Pressey II 

46 



Number of 
words written 


Story 
value 


.18 
• 24 


.29 
■33 
•31 


.10 
•07 
■14 
.09 


•31 

.28 

•36 
■33 


.12 

.11 
.22 


•24 

14 

.18 


-.07 
.26 
.28 


.11 

•37 
•43 


•13 


•23 


.09 


.11 


.16 


.06 


.04 


.11 


.12 


.12 


.10 
■05 


.29 
.18 



accounts for the fact that certain corrected coefficients are not given 
in Tables XVI and XVII. It will be noted in these tables that, 
occasionally, a coefficient greater than i.oo is given. This is due 
to chance errors in the raw coefficients of correlation which, in turn, 
are due to the fact that a sample of the total population was used in 
calculating them. The corrected coefficients are, in general, larger 
than the corresponding raw coefficients. 

Table XVI gives the corrected coefficients for the comprehen- 
sion scores. A significant characteristic of this table is the variation 
in the degree of intercorrelation between the tests. For example, 
Monroe's Standardized Silent Reading Test I correlates very highly 
with the number of questions answered correctly on the Courtis 
Silent Reading Test, No. 2. It correlates less highly with the other 
two scores of this test. The degree of its correlation with the other 
tests is moderately low. It is significant that the corrected coeffi- 
cients of correlation between the two tests requiring reproduction 
are not higher. For example, the highest coefficient of correlation 
between Brown's test and the Experimental Reproduction Test I 
is ,79. The lowest is .26. The corrected coefficient of correlation 
between the scores obtained by the word-counting method is .33; 
for the idea-counting method the coefficient of correlation is .62. 
The highest correlation between Brown's test and the Experimental 
Reproduction Test I is for the number of questions answered cor- 
rectly. In the seventh grade, the corrected coefficients of 
correlation between the question scores yielded by the 
Experimental Reproduction Test II and Starch's Silent Reading 
Test are as high as those obtained from the reproductions. Both 
Starch's test and the Experimental Reproduction Test correlate 
nearly as highly with Monroe's Standardized Silent Reading Test 
as with each other. A number of the coefficients of correlation for 
the Cross-Out Test are relatively high. It correlates most highly 
with Monroe's Standardized Silent Reading Test. In general, the 
coefficients are higher for the scores obtained by C — W than for 

C W 

t :^ . The former is probably the better plan of scoring. 

Table XVI appears to bear out the usual assumption that diff"er- 
ent silent reading tests measure different phases of silent reading 
ability. It is very obvious, in a number of cases, that the same 
traits are not measured by different tests. However, it should be 
noted that these differences exist for tests that are similar in struc- 
ture as well as for tests which possess marked differences in struc- 

47 



i 


> ? 


^^" 








1 












■* 






t-\o 


' 


"- 


Os 








00 


Iv 


\j-t tv 




0+3 
AV-3 


VO 1^^ 






M O 


"100 ■* 




^ 


so SO so tv 


AV-D 


00 00 o- 








00 c^'O 




^ 


O- <^so -* 
tv Ln tvoo 


o 


O+O 
M-3 




>n - •* 


\^\0 w^ ^^ t^ 




un <^ ■* 






fvso ■* 


A\-3 






00 <s O- « - 
N M "00 t^ 










OssO t^ 
so "llv 


o 


SPJOM 


O-oo^ 
rn t-1 ■* 






■♦ " 




^ ? 


^ 


-t fv tvso 


SBSpi 


OOO N 
■* ■>*• 1^ 






— IN. 




tn 00 


s? 


« 00 c^ tv 
loOOOO t- 


suoijsanf) 








t-~ " 




00 m 
wi so 


tN 


►- tv ■*• M 

OsiOOOOO 




SPJOM 


■*^o 00 


"S^J^^, 


C>vO t^ «n00 










yr, OvOO 


SE3PI 


r^ Cvoo 


f'.vO t^ 


00 OM/lOM 






tN t^ 




O O t^ 

so OsOO 


suopsanf) 




OOO 








so lo 




f^tv Os 
SO t-00 




9B3pj 










r;'^:^ 


sF S, 


Os 


Os c^so O 

SO 00 OsOO 


spjoA\ 


I^VO 1^ 








I-- OS ■* 

U-, ■* m 


■O « 


■* 


IM OsOO N- 

tvOO 00 00 


> 

-a 2 


SESpi 


t> r^ O 


■ri- \J-) t^ 






t^^OvO 


tN l^ 




00 tv •- 
tvOO Os 


8PJ0A\ 


KO t^F^ 










00 r^ 




OsOO O 
tv\0 00 


33EJ3AY 




00 






t^ Ul tN 


Os >n 




■*00 
00 ■<(• 


XlIIBtlf) 




^- 








<S s3 




Ooo 


XjijuEnf) 


8 


O 00 






TJ-OO o 
'I- ^ u-> 


00 1^ 




Os rn 


> „ 
"^ 2 

6<3 


a33JJ03 

suoiisanf) 


O^ OCC 




00 a. t^ 




I^ xi-wn 


s_ 5 




'Soo O 
. rooo 


suousanf) 
}o -ON 






ir> t«.vO 




OVO >n 


o. ^ 




OsOO r) 
tvsoso 


•dtuo3 
;o xapuj 


i^^o r^ 




O U100 O O 




OtnOO 


so IS 




m ■* lo 


> 

o 


III-II 












Os so 


^ 


IV « ■* Os 
Os IvOOOO 


III-I 








00 o> 


00 00 00 


00 tN 


so 
SO 


O Os ■»*• t*1 
Os ■* IVOO 


II-I 








tNI-~ 


LOOM> 


00 so 


^ 


so N 10-* 
OS wi tvoo 


> 
o 


IIl-II 




r^ moo 






SO 00 00 
SO 1- ■* 


r^ yr\ 




tv ^ N 

Os -"too 


III-I 




^ (-~ o 






t', Osso 
so imn 


R - 




tv NSO 


II-I 






c 




VC 


lO. 




li 


1 -^ 


^5 




so 








O 


■ 

s ■>! 


c 






W 


c 




2 


S 




c 

.2 
c 

II 

° V 

III 

o 


c 
U 

c 
o 

d 




> 


> 

c 


> 

< 


1 <r 

c 


T3 


1 


T 




O c 

'*-'.£ 


;: 


•a 

o 


1 

o 

o 

Icj 


1 

u 


o 

+ 


> 
1 




1 


K 


U 


c 

6 



48 





Grade 

VII 

Comp. 

A 














so 
oo 




t 


SO 


o 

so 




CO 

Ov 








Grade 

VII 

Comp. 


OS C?N 0\ 






oo 


Oi 


CO 

oo 






CO 

On 




Grade 

IV 
Comp. 


o ^ o 

On Cs Os 


-J- 

On 


On 
oo 




O 

0\ 


CO 
oo 










Grade 

VII 

Pressey 


co oo oo 








'^ 


1^ 

■>*■ 






O 

so 




Grade 

yii 

Cross 
Out 










so 






CO 
oo 


so 




Grade 

IV 
Cross 
Out 


On 0\ Os 


so 


oo 

so 




CO 






CO 
oo 






Grade 

VII 

Reprod. 


ro r^ cs 

oo oo ^ 






r^ 




so 


•^ 


ON 


o 




Grade 

IV 
Reprod. 


Tj- Tj-N£) 

r^oo so 


o 

ON 


On 






CO 




o 

On 






Grade 
VII 

Starch 


so so so 








C4 

r^ 


r- 
i>-i 


ST) 


oo 


so 

oo 




Grade 

IV 
Brown 


On CO f^ 

SO r- r- 


oo 






0\ 


oo 
so 




On 

oo 






Grade 

IV 
Courtis 






oo 




o 

ON 


so 




On 






Grade VII 

Monroe 

I- I- II- 

II III III 








NO 
SO 

so 
so 


C4 

SO 

r- 

oo 

CO 

oo 


SO 

so 
SO 
cs 


oo 

•-0 

oo 

oo 


ON 
ON 

On 


so 
CO 




Grade IV 

Monroe 

I- I- II- 

II III III 


c- 
oc 


c- 

r 

c 
^ 

1 

so 


1 




so 
so 

'*• 

oo 


C 

c 




-1 c 

^ c 

^ c 

c 

^ 


^ 
^ 






*1 

HJ 

c 

u. 

c 
c 


7 


1 




u 

u 

3 


U 




2 
m 




u 
a 
c/2 




c 
_c 

u 

3 

"a 

s 

D 




C 



U 




> 

HJ 
CLi 


- 


1 


, 


< 

E 
o 
U 


u 



49 



ture. In fact, the variations in these corrected coefficients of corre- 
lation are so erratic that one is inclined -to be skeptical of any con- 
clusions which may be drawn from them with reference to the 
functions of the different tests. 

The corrected coefficients for the rate scores are given in Table 
XVII. These are, in general, higher than those for comprehension. 
In general, the correlation between tests in which the pupil reads 
continuously is higher than between one test in which the pupil 
reads continuously and another in which his reading is not contin- 
uous. However, the correlation between Monroe's Standardized 
Silent Reading Test I and the Cross-Out Test, in the fourth grade, 
is as high as that for any of the other tests. The fact that some of 
the tests were too short and failed to discriminate between a consid- 
erable number of pupils probably accounts for the fact that a num- 
ber of coefficients of correlation are not higher. An examination of 
this table indicates that the rate score secured by means of Monroe's 
Standardized Silent Reading Tests is a true measure of the pupil's 
rate of reading. 

Correlation of single tests with composites. In Tables XVI and 
XVII, the corrected coefficients of correlation for each test with cer- 
tain composite scores are given. These, in general, are larger than 
the coefficients of correlation between single tests. In the fourth 
grade, composite A for comprehension is the average of Monroe, 
comprehension, Courtis, answers correct, and Reproduction, answers 
to questions. In the seventh grade, the Courtis test was not given 
and this composite includes only the other two tests. Composite B 
for comprehension is the average of the comprehension scores de- 
rived from reproductions. In the case of Brown's Silent Reading 
Tests, both quality and quantity are used. In the other cases, the 
scores obtained by both the idea-counting method and the word- 
counting method are used. Composite C is the average of composite 
A and composite B. The general composite is formed by combining 
all of the scores obtained. 

Monroe's Standardized Silent Reading Tests are shown to cor- 
relate very highly with composite A. The correlation with com- 
posite B is very much less, as might be expected. The rate scores 
derived from this test also correlate very highly with the general 
composite scores. In fact, with the exception of Pressey's test, the 
correlation of single tests with the composite scores is very high. It 
appears, therefore, that each of the tests yields rate scores whicb 

50 



may be accepted as correlating very highly with the true rate of 
silent reading. The scores derived from the Experimental Repro- 
duction Tests in the fourth grade correlate more highly with com- 
posite B than those derived from Brown's Silent Reading Test. In 
the seventh grade, the correlations between Starch's test and com- 
posite B are slightly higher than those for the Experimental Repro- 
duction Tests. It appears, however, that the Experimental Repro- 
duction Tests yield approximately as valid measurements of ability 
to comprehend as are secured by means of the other tests which, 
presumably, have been devised with greater care. 

SUMMARY OF CONCLUSIONS. 

1. The scoring of reproductions is so highly subjective that a 
silent reading test requiring reproduction of material read cannot be 
considered satisfactory. 

2. Brown's Silent Reading Test is very unreliable for both 
comprehension and rate. This is true, even when the average of 
two independent scores is used as a measure of comprehension. 

3. The correlation between scores yielded by the memory 
test and comprehension scores based upon reproductions is only 
slightly higher than that existing between the scores derived from 
the memory test and the comprehension scores yielded by Monroe's 
Standardized Silent Reading Test. This makes doubtful the usual 
assumption that measures of comprehension based upon reproduc- 
tions are affected by the pupil's ability to remember. 

4. Correlation between extent of vocabulary and ability to read 
is surprisingly low. There is little, if any, relation between these 
two abilities. 

5. The intercorrelations between tests indicate that different 
tests measure slightly different traits; but it is surprising to find, in 
a few instances, a high degree of correlation existing between scores 
yielded by tests which exhibit marked differences in structure. 

6. There appears to be a higher degree of correlation between 
the story value of written compositions and comprehension than 
between the number of words written and the measures of compre- 
hension. This is true even when the measures of comprehension 
are based upon reproductions and the reproductions are described 
in terms of the number of words or number of ideas reproduced. 

7. In the measurement of rate of silent reading, the Courtis 
Silent Reading Test No. 2, is shown to have the highest degree 
of reliability. Monroe's "Standardized Silent Reading Tests, which 

51 



were intended to yield only very crude measures of rate of silent 
reading, are shown to be among the most reliable tests. 

8. In measuring comprehension, the Courtis Silent Reading 
Test, No, 2, is the most reliable. 

9. The coefficient of reliability is shown not to be a satisfactory 
measure of reliability. 

10. Comparisons with teachers' ratings indicate that, in the 
fourth grade, teachers tend to judge silent reading ability on the 
basis of the pupil's ability to answer questions. In the seventh grade, 
teachers give greater weight to the pupil's ability to reproduce or 
tell what they have read. 

Correlation with composites. In Tables XVI and XVII, the 
corrected coefficients of correlation of each test with the composite 
scores are given. These, in general, are larger than the correlations 
between single tests. Monroe's Standardized Silent Reading Test 
correlates very highly with composite A. This means that this test, 
which is very simple to administer, yields measures of essentially 
the same traits as are secured by means of this composite, which 
in the fourth grade involves three scores and in the seventh, two 
scores. The correlation with composite C and with the general com- 
posite is also high. In fact, with the partial exception of Starch's 
Test, no other correlations are as high as these two composites of 
the Monroe Silent Reading Tests. It, therefore, appears, as judged 
by composite scores, that this test yields measures of comprehen- 
sion which agree more closely with the composite measures secured 
from this group of tests than any other single test. The correla- 
tions for rate are also high. 



52 



THE UNIVERSITY OF ILLINOIS 

THE STATE UNIVERSITY 

URBANA 

DAVID KINLEY, Ph.D., LL.D., President 



The University Includes the Following Departments 

The Graduate School 

The College of Liberal Arts and Sciences (Ancient and Modern Languages 
and Literatures; History, Economics, Political Science, Sociology, Philosophy, 
Psychology, Education; Mathematics; Astronomy; Geology; Physics; Chemistry; 
Botany, Zoology, Entomology; Physiology, Art and Design) 

The College of Commerce and Business Administration (General Business, 
Banking, Insurance, Accountancy, Railway Administration, Foreign Commerce; 
Courses for Commercial Teachers and Commercial and Civic Secretaries) 

The College of Engineering (Architecture; Architectural, Ceramic, Civil, Elec- 
trical, Mechanical, Mining, Municipal and Sanitary, Railway Engineering, and 
General Engineering Physics) 

The College of Agriculture (Agronomy; Animal Husbandry; Dairy Husbandry; 
Horticulture and Landscape Gardening; Agricultural Extension; Teachers' 
Course; Home Economics) 

The College of Law (Three-year and four-year curriculums based on two years 
and one year of college work respectively) 

The College of Education 

The Curriculum in Journalism 

The Curriculums in Chemistry and Chemical Engineering 

The School of Railway Engineering and Administration 

The School of Music (four-year curriculum) 

The Library School (two=^year curriculum for college graduates) 

The College of Medicine (in Chicago) 

The College of Dentistry (in Chicago) 

The School of Pharmacy (in Chicago; Ph.G. and Ph.C. curriculums) 

The Summer Session (eight weeks) 

Experiment Stations and Scientific Bureaus: U. S. Agricultural Experiment 
Station; Engineering and Experiment Station; State Laboratory of Natural 
History; State Entomologist's OfBce; Biological Experiment Station on Illinois 
River; State Water Survey; State Geological Survey; U. S. Bureau of Mines 
Experiment Station. 

The library collections contain May i, 1922, 523,230 volumes and 120,131 pam- 
phlets. For catalogs and information address 

THE REGISTRAR 

Urbana, Illinois 



LIBRARV OF CONGRESS 




BULLETINS OF THE BUREAU OF EDUCATIONAL RE- 
SEARCH, COLLEGE OF EDUCATION, UNIVERSITY 
OF ILLINOIS, URBANA, ILLINOIS. 

Price. 

No. I. Buckingham, B. R. Bureau of Educational Research, 

Announcement, 1918-19 15 

No. 2. First Annual Report 25 

No. 3. Bamesberger, Velda C. Standard Requirements for 

Memorizing Literary Material 50 

No. 4. Holley, Charles E. Mental Tests for School Use. 

(Out of print) 50 

No. 5. Monroe, Walter S. Report of Division of Educational 

Tests for 1919-20 25 

No. 6. Monroe, Walter S. The Illinois Examination 50 

No. 7. Monroe, Walter S. Types of Learning Required of 
Pupils in the Seventh and Eighth Grades and in the 
High School 15 

No. 8. Monroe, Walter S. A Critical Study of Certain Silent 

Reading Tests 50 

No. 9. Monroe, Walter S. Written Examinations and Their 

Improvement. (In preparation) 50 



kBiiKK.^^-^P.'^gress 



^021 728 7145 



HoUinger Corp. 
pH8.5 



